Web scraping can look deceptively easy these days. There are numerous open-source libraries/frameworks, visual scraping tools and data extraction tools that make it very easy to scrape data from a website. However, when you want to scrape websites at scale things start to get very tricky, very fast.
Unbeknownst to many, there is a data revolution happening in finance.
Throughout the history of the financial markets information has been power. The trader with access to the most accurate information can quickly gain an edge over the market.
Over the last couple weeks, GDPR has brought data protection center stage. What was once a fringe concern for most businesses overnight became a burning problem that needed to be solved immediately.
It’s been another standout year for Scrapinghub and the scraping community at large. Together we crawled 79.1 billion pages (nearly double 2016), with over 103 billion scraped records; what a year!
Up until now, your deployment process using Scrapy Cloud has probably been something like this: code and test your spiders locally, commit and push your changes to a GitHub repository, and finally deploy them to Scrapy Cloud using
shub deploy. However, having the development and the deployment processes in isolated steps might bring you some issues, such as unversioned and outdated code running...