Browsed by
Author: Pablo Hoffman

Introducing Crawlera free trials & new plans

One of the biggest pain points we’ve heard from our Crawlera customers last year is the inconvenience of having to jump from one Crawlera plan to another, when more requests are needed in a month. For this reason, we have been working on rethinking our Crawlera plans to better accommodate these cases and be more flexible with customers that have variable crawling requirements from month to...

Google Summer of Code 2015

We are very excited to be participating again this year on Google Summer of Code. After a successful experience last year where Julia Medina (now a proud Scrapinghubber!) worked on Scrapy API cleanup and per-spider settings, we are back again this year with 3 ideas approved:

Using git to manage vacations in a large distributed team

Here at Scrapinghub we are a remote team of 100+ engineers distributed among 30+ countries. As part of their standard contract, Scrapinghubbers get 20 vacation days per year and local country holidays off, and yet we spent almost zero time managing this. How do we do it?. The answer is “git” and here we explain how.

Why we moved to Slack

We are veterans in the chat group arena. We have been using one form of another since we started Scrapinghub in 2010 and I've been personally using corporate group chats since 2004. We started Scrapinghub using our own hosted version of ejabberd, then moved to HipChat in 2013 and we just finished moving to Slack. Thanks to Slack's migration tools, the process went pretty smoothly. In this post...

Open Source at Scrapinghub

Here at Scrapinghub we love open source. We love using and contributing to it. Over these years we have open sourced a few projects, that we keep using over and over, in the hope that it will make others lives easier. Writing reusable code is harder than it sounds, but it enforces good practices such as documenting accurately, testing extensively and worrying about backwards support. In the end...

Marcos Campal Is a ScrapingHubber!

We’re excited to welcome Marcos Campal to the Scrapinghub engineering team.

Introducing Crawlera, a Smart Page Downloader

We are proud to introduce Crawlera, a smart web downloader designed specifically for web crawling.

Git Workflow for Scrapy Projects

Our customers often ask us what's the best workflow for working with Scrapy projects. A popular approach we have seen and used in the past is to split the spiders folder (typically project/spiders) into two folders: project/spiders_prod and project/spiders_dev, and use the SPIDER_MODULES setting to control which spiders are loaded on each environment. This works reasonably well, until you have to...

How to Fill Login Forms Automatically

We often have to write spiders that need to login to sites, in order to scrape data from them. Our customers provide us with the site, username and password, and we do the rest.

Spiders activity graphs

Today we are introducing a new feature called Spider activity graphs. These allow you to visualize quickly how your spiders are working, and it's a very useful tool for busy projects to find out which spiders are not working as expected.