NSC-R Workshops: Web scraping

Danielle van Westbroek-Stibbe

This was a hybrid workshop (May 23,2023, 13:00-14:00), physically in the Breakout Room at NSCR and online on Zoom.

In this NSC-R workshop Danielle van Westbroek-Stibbe described and demonstrated web scraping in R. Web scraping involves the extraction of data from websites, which can be done manually (e.g., copy-pasting) or automatically. This process extends beyond existing datasets, allowing you to retrieve any type of data found on a webpage. Once the information is saved to disk in HTML format, we can parse it to become more readable.

In this workshop, Danielle demonstrated how we can use the R package “rvest” to extract data from the NSC-R Workshops website. During this demonstration, she created a comprehensive dataset that encompasses the workshops conducted with the NSC-R community.

Content of the workshop:

What is web scraping and what is it used for?
HTML basics
Web scraping and parsing in R

Required packages:

For web scraping and parsing: rvest
For tidy coding: dplyr

To read a little about web scraping, please refer to this source.

Danielle van Westbroek-Stibbe is a PhD candidate at NSCR and Utrecht Universiteit and a member of the NSC-R Workshops team. Her research focuses on cybercriminal decision-making.

Materials

All elements that Danielle presented and discussed during the workshop are included in this Markdown document which you can open (and adapt if you wish) and knit in the Rstudio environment).

Alternatively, you can also immediately view the result as a HTML document here

Citation

For attribution, please cite this work as

Westbroek-Stibbe (2023, May 23). NSC-R Workshops: Web scraping. Retrieved from https://nscrweb.netlify.app/posts/2023-05-23-webscraping/

BibTeX citation

@misc{westbroek-stibbe2023web,
  author = {Westbroek-Stibbe, Danielle van},
  title = {NSC-R Workshops: Web scraping},
  url = {https://nscrweb.netlify.app/posts/2023-05-23-webscraping/},
  year = {2023}
}