One of the things that takes more time when building a spider is reviewing the scraped data and making sure it conforms to the requirements and expectations of your client or team. This process is so time consuming that, in many cases, it ends up taking more time than writing the spider code itself, depending on how well the requirements are written. To make this process more efficient we have...
We often have to write spiders that need to login to sites, in order to scrape data from them. Our customers provide us with the site, username and password, and we do the rest.
Today we are introducing a new feature called Spider activity graphs. These allow you to visualize quickly how your spiders are working, and it's a very useful tool for busy projects to find out which spiders are not working as expected.
After 10 months of work, and many changes, we are pleased to announce the release of Scrapy 0.14.
In an effort to make Scrapy code smaller and more reusable, we’ve been working on splitting the Scrapy codebase into two different modules: