It’s hard to believe our annual Shubber GetTogether is already over.
When it comes to web scraping, one key element is often overlooked until it becomes a big problem.
That is data quality.
Getting consistent high quality data when scraping the web is critical to the success of any web scraping project, particularly when scraping the web at scale or extracting mission critical data where accuracy is paramount.
Data quality can be the difference between a...
Google Summer of Code (GSoC) was such a great experience for students like me. I learned so much about open source communities as well as contributing to their complex projects. I also learned a great deal from my mentors, Konstantin and Cathal, about programming and software engineering practices. In my opinion, the most valuable lesson I got from GSoC was what it was like to be a Software...
Unless you’ve been living under a rock for the past few months you know that the EU’s General Data Protection Regulation (GDPR) is upon us.
It is the most comprehensive data protection law ever been introduced, fundamentally changing the way companies can use the personal data of their customers and prospects.
There are countless articles and guides about how GDPR will affect your company’s...
Web scraping can look deceptively easy these days. There are numerous open-source libraries/frameworks, visual scraping tools and data extraction tools that make it very easy to scrape data from a website. However, when you want to scrape websites at scale things start to get very tricky, very fast.
Unbeknownst to many, there is a data revolution happening in finance.
Throughout the history of the financial markets information has been power. The trader with access to the most accurate information can quickly gain an edge over the market.
Over the last couple weeks, GDPR has brought data protection center stage. What was once a fringe concern for most businesses overnight became a burning problem that needed to be solved immediately.
It’s been another standout year for Scrapinghub and the scraping community at large. Together we crawled 79.1 billion pages (nearly double 2016), with over 103 billion scraped records; what a year!