Google Summer of Code 2015

We are very excited to be participating again this year on Google Summer of Code. After a successful experience last year where Julia Medina (now a proud Scrapinghubber!) worked on Scrapy API cleanup and per-spider settings, we are back again this year with 3 ideas approved:

Link Analysis Algorithms Explained

When scraping content from the web, you often crawl websites which you have no prior knowledge of. Link analysis algorithms are incredibly useful in these scenarios to guide the crawler to relevant pages.

EuroPython, here we go!

We are very excited about EuroPython 2015!

Aduana: Link Analysis with Frontera

It's not uncommon to need to crawl a large number of unfamiliar websites when gathering content. Page ranking algorithms are incredibly useful in these scenarios as it can be tricky to determine which pages are relevant to the content you're looking for.

Using git to manage vacations in a large distributed team

Here at Scrapinghub we are a remote team of 100+ engineers distributed among 30+ countries. As part of their standard contract, Scrapinghubbers get 20 vacation days per year and local country holidays off, and yet we spent almost zero time managing this. How do we do it?. The answer is “git” and here we explain how.