Scrapy on the Road to Python 3 Support

Scrapy is one of the few popular Python packages (almost 10k github stars) that's not yet compatible with Python 3. The team and community around it are working to make it compatible as soon as possible. Here's an overview of what has been happening so far.

Introducing Javascript support for Portia

Today we released the latest version of Portia bringing with it the ability to crawl pages that require JavaScript. To celebrate this release we are making Splash available as a free trial to all Portia users so you can try it out with your projects.

Distributed Frontera: Web Crawling at Scale

Welcome to Distributed Frontera

This past year, we have been working on a distributed version of our crawl frontier framework, Frontera. This work was partially funded by DARPA and is included in the DARPA Open Catalog.

The Road to Loading JavaScript in Portia

Support for JavaScript has been a much requested feature ever since Portia’s first release 2 years ago. The wait is nearly over and we are happy to inform you that we will be launching these changes in the very near future. If you’re feeling adventurous you can try it out on the develop branch at Github. This post aims to highlight the path we took to achieving JavaScript support in Portia.

Be the first to know. Gain insights. Make better decisions.

Use web data to do all this and more. We’ve been crawling the web since 2010 and can provide you with web data as a service.

Tell me more

Welcome

Here we blog about all things related to web scraping and web data.

If you want to learn more about how you can use web data in your company, check out our Data as a Services page for inspiration.

Learn More

Recent Posts