Scrapy on the Road to Python 3 Support

Scrapy is one of the few popular Python packages (almost 10k github stars) that's not yet compatible with Python 3. The team and community around it are working to make it compatible as soon as possible. Here's an overview of what has been happening so far.

Introducing Javascript support for Portia

Note: Portia is no longer available for new users. It has been disabled for all the new organisations from August 20, 2018 onward.

Today we released the latest version of Portia bringing with it the ability to crawl pages that require JavaScript. To celebrate this release we are making Splash available as a free trial to all Portia users so you can try it out with your projects.

Distributed Frontera: Web Crawling at Scale

Welcome to Distributed Frontera

This past year, we have been working on a distributed version of our crawl frontier framework, Frontera. This work was partially funded by DARPA and is included in the DARPA Open Catalog.

The Road to Loading JavaScript in Portia

Note: Portia is no longer available for new users. It has been disabled for all the new organisations from August 20, 2018 onward.

Support for JavaScript has been a much requested feature ever since Portia’s first release 2 years ago. The wait is nearly over and we are happy to inform you that we will be launching these changes in the very near future. If you’re feeling adventurous you can try it...