Scrapy is at the heart of Scrapinghub. We use this framework extensively and have accumulated a wide range of shortcuts to get around common problems. We’re launching a series to share these Scrapy tips with you so that you can get the most out of your daily workflow. Each post will feature two to three tips, so stay tuned.
Meet Tomás Rinke. He is the CTO and Co-Founder of RINAR Solutions, a startup that provides data consulting services to inform decision making. He is an avid Scrapy user and a Scrapinghub development partner. As an off-shoot of RINAR Solutions, he developed DataJudicial, an app that provides information on the legal sector.
This post kicks off a series of articles that will trace the prices of some of the top gifts, gadgets, and gizmos from Black Friday through to January 2016.
Scrapy is one of the few popular Python packages (almost 10k github stars) that's not yet compatible with Python 3. The team and community around it are working to make it compatible as soon as possible. Here's an overview of what has been happening so far.
At Scrapinghub we're always building and running large crawls–last year we had 11 billion requests made on Scrapy Cloud alone. Crawling millions of pages from the internet requires more sophistication than getting a few contacts of a list, as we need to make sure that we get reliable data, up to date lists of item pages and are able to optimise our crawl as much as possible.