Use R to extract data from internet.
This was a hybrid workshop (May 23,2023, 13:00-14:00), physically in the Breakout Room at NSCR and online on Zoom.
In this NSC-R workshop Danielle van Westbroek-Stibbe described and demonstrated web scraping in R. Web scraping involves the extraction of data from websites, which can be done manually (e.g., copy-pasting) or automatically. This process extends beyond existing datasets, allowing you to retrieve any type of data found on a webpage. Once the information is saved to disk in HTML format, we can parse it to become more readable.
In this workshop, Danielle demonstrated how we can use the R package “rvest” to extract data from the NSC-R Workshops website. During this demonstration, she created a comprehensive dataset that encompasses the workshops conducted with the NSC-R community.
Content of the workshop:
Required packages:
To read a little about web scraping, please refer to this source.
Danielle van Westbroek-Stibbe is a PhD candidate at NSCR and Utrecht Universiteit and a member of the NSC-R Workshops team. Her research focuses on cybercriminal decision-making.
All elements that Danielle presented and discussed during the workshop are included in this Markdown
document which you can open (and adapt if you wish) and knit
in the Rstudio environment).
Alternatively, you can also immediately view the result as a HTML document here
For attribution, please cite this work as
Westbroek-Stibbe (2023, May 23). NSC-R Workshops: Web scraping. Retrieved from https://nscrweb.netlify.app/posts/2023-05-23-webscraping/
BibTeX citation
@misc{westbroek-stibbe2023web, author = {Westbroek-Stibbe, Danielle van}, title = {NSC-R Workshops: Web scraping}, url = {https://nscrweb.netlify.app/posts/2023-05-23-webscraping/}, year = {2023} }