Google Summer of Code 2015

We are very excited to be participating again this year on Google Summer of Code. After a successful experience last year where Julia Medina (now a proud Scrapinghubber!) worked on Scrapy API cleanup and per-spider settings, we are back again this year with 3 ideas approved:

Link Analysis Algorithms Explained

When scraping content from the web, you often crawl websites which you have no prior knowledge of. Link analysis algorithms are incredibly useful in these scenarios to guide the crawler to relevant pages.

EuroPython, here we go!

We are very excited about EuroPython 2015!

Aduana: Link Analysis with Frontera

It's not uncommon to need to crawl a large number of unfamiliar websites when gathering content. Page ranking algorithms are incredibly useful in these scenarios as it can be tricky to determine which pages are relevant to the content you're looking for.

Using git to manage vacations in a large distributed team

Here at Scrapinghub we are a remote team of 100+ engineers distributed among 30+ countries. As part of their standard contract, Scrapinghubbers get 20 vacation days per year and local country holidays off, and yet we spent almost zero time managing this. How do we do it?. The answer is “git” and here we explain how.

Be the first to know. Gain insights. Make better decisions.

Use web data to do all this and more. We’ve been crawling the web since 2010 and can provide you with web data as a service.

Tell me more

Welcome

Here we blog about all things related to web scraping and web data.

If you want to learn more about how you can use web data in your company, check out our Data as a Services page for inspiration.

Learn More

Recent Posts