Browsed by
Author: Alexander Sibiryakov

Improved Frontera: Web Crawling at Scale with Python 3 Support

Improved Frontera: Web Crawling at Scale with Python 3 Support

Python is our go-to language of choice and Python 2 is losing traction. In order to survive, older programs need to be Python 3 compatible. And so we’re pleased to announce that Frontera will remain alive and kicking because it now supports Python 3 in full! Joining the ranks of Scrapy and Scrapy Cloud, you can officially continue to quickly create and scale fully formed crawlers without any issues in your Python 3-ready stack. As a key web crawling toolbox…

Read More Read More

Distributed Frontera: Web Crawling at Scale

Distributed Frontera: Web Crawling at Scale

Welcome to Distributed Frontera This past year, we have been working on a distributed version of our crawl frontier framework, Frontera. This work was partially funded by DARPA and is included in the DARPA Open Catalog. The project came about when a client of ours expressed interest in building a crawler that could identify frequently changing hubs. Hubs are web pages that contain a large number of outgoing links to authority sites. For example, Reddit, the DMOZ Directory and Hacker…

Read More Read More