Why do we need to scrape data?
Data visualization is such a powerful tool. Data is the strongest way to make an argument, communicate an issue, and educate an audience. However, data is only useful when delivered in a way that makes it accessible to those unfamiliar with the material.
To prepare data for visualization, it needs to be organized and stored in a logical way. Data collection is usually done through API calls. In this instance, the data is already prepared in a defined structure and ready to be siphoned into some form of storage whether it’s a SQL database or a JSON object. The majority of data that is available to us isn’t available in a way that is readily accessible in terms of collection.
Data scraping is a process of identifying a pattern in seemingly unorganized data, and structuring it in a way that makes it collectable for storage.
From there, what you decide to do with the data is limitless.
For this workshop, we’ll be using Nokogiri to parse HTML from a website, and identify patterns of data storage within HTML elements.
Key takeaways:
- All data that is accessible is consumable
- Consuming data allows us to format it in a way that is digestable for broader audiences
- Patterns can almost always be identified in unorganized data