Browsed by
Category: Releases

Deploy your Scrapy Spiders from GitHub

Deploy your Scrapy Spiders from GitHub

Up until now, your deployment process using Scrapy Cloud has probably been something like this: code and test your spiders locally, commit and push your changes to a GitHub repository, and finally deploy them to Scrapy Cloud using shub deploy. However, having the development and the deployment processes in isolated steps might bring you some issues, such as unversioned and outdated code running in production. The good news is that, from now on, you can have your code automatically deployed…

Read More Read More

Improved Frontera: Web Crawling at Scale with Python 3 Support

Improved Frontera: Web Crawling at Scale with Python 3 Support

Python is our go-to language of choice and Python 2 is losing traction. In order to survive, older programs need to be Python 3 compatible. And so we’re pleased to announce that Frontera will remain alive and kicking because it now supports Python 3 in full! Joining the ranks of Scrapy and Scrapy Cloud, you can officially continue to quickly create and scale fully formed crawlers without any issues in your Python 3-ready stack. As a key web crawling toolbox…

Read More Read More

Introducing Scrapy Cloud with Python 3 Support

Introducing Scrapy Cloud with Python 3 Support

It’s the end of an era. Python 2 is on its way out with only a few security and bug fixes forthcoming from now until its official retirement in 2020. Given this withdrawal of support and the fact that Python 3 has snazzier features, we are thrilled to announce that Scrapy Cloud now officially supports Python 3. If you are new to Scrapinghub, Scrapy Cloud is our production platform that allows you to deploy, monitor, and scale your web scraping…

Read More Read More

Introducing Portia2Code: Portia Projects into Scrapy Spiders

Introducing Portia2Code: Portia Projects into Scrapy Spiders

We’re thrilled to announce the release of our latest tool, Portia2Code! With it you can convert your Portia 2.0 projects into Scrapy spiders. This means you can add your own functionality and use Portia’s friendly UI to quickly prototype your spiders, giving you much more control and flexibility. A perfect example of where you may find this new feature useful is when you need to interact with the web page. You can convert your Portia project to Scrapy, and then…

Read More Read More

Introducing the Datasets Catalog

Introducing the Datasets Catalog

Folks using Portia and Scrapy are engaged in a variety of fascinating web crawling projects, so we wanted to provide you with a way to share your data extraction prowess with the world. With this need in mind, we’re pleased to introduce the latest addition to our Scrapinghub platform: the Datasets Catalog! This new feature allows you to immediately share the results of your Scrapinghub projects as publicly searchable datasets. Not only is this a great way to collaborate with others, you…

Read More Read More

Introducing the Crawlera Dashboard

Introducing the Crawlera Dashboard

We’ve been rolling out a lot of updates, upgrades, and new features lately, and we’re continuing this trend by announcing the very first Crawlera Dashboard! Crawlera is a smart downloader that allows you to crawl and scrape websites responsibly. It rotates IP addresses and keeps track of which ones have been blocked by websites, ensuring that your crawls continue uninterrupted. Since Crawlera has always been a mainstay of Scrapinghub, we wanted to revamp its presentation to help you crawl the web…

Read More Read More

Data Extraction with Scrapy and Python 3

Data Extraction with Scrapy and Python 3

Scrapy 1.1 Release with Official Python 3 Support Fasten your seat belts, ladies and gentlemen: Scrapy 1.1 with Python 3 support is officially out! After a couple months of hard work and four release candidates, this is the first official Scrapy release to support Python 3. We know that many of you have been eagerly looking forward to moving your whole stack to Python 3. Well, wait no more, you can get rid of Python 2 once and for all…

Read More Read More

Introducing Scrapy Cloud 2.0

Introducing Scrapy Cloud 2.0

Scrapy Cloud has been with Scrapinghub since the beginning, but we decided some spring cleaning was in order. To that end, we’re proud to announce Scrapy Cloud 2.0! This overhaul will help you improve and scale your web scraping projects. Among other perks, our upgraded cloud-based platform includes a brand new and much more flexible architecture based on containers. While much of this upgrade is behind the scenes, what’s most important to you is pricing changes, the introduction of Docker…

Read More Read More

Machine Learning with Web Scraping: New MonkeyLearn Addon

Machine Learning with Web Scraping: New MonkeyLearn Addon

Say Hello to the MonkeyLearn Addon We deal in data. Vast amounts of it. But while we’ve been traditionally involved in providing you with the data that you need, we are now taking it a step further by helping you analyze it as well. To this end, we’d like to officially announce the MonkeyLearn integration for Scrapy Cloud. This feature will bring machine learning technology to the data that you extract through Scrapy Cloud. We also offer a MonkeyLearn Scrapy Middleware so…

Read More Read More

Splash 2.0 Is Here with Qt 5 and Python 3

Splash 2.0 Is Here with Qt 5 and Python 3

We’re pleased to announce that Splash 2.0 is officially live after many months of hard work. For those unfamiliar with Splash, it’s a headless browser we developed specifically for web crawling. Splash executes and renders JavaScript so you can deal with dynamic content. It also supports scripting so you can perform actions on the page. Splash is open source and fully integrated with Scrapy and Portia. You can also use its API to integrate with any project that needs to…

Read More Read More