The History of Scrapinghub

The History of Scrapinghub

Joanne O’Flynn meets with Pablo Hoffman and Shane Evans to find out what inspired them to set up web crawling company Scrapinghub.

Scrapinghub may be a very young company but it already has a great story to tell. Shane Evans from Ireland and Pablo Hoffman from Uruguay came together in a meeting of great web crawling minds to form the business after working together on the same project but for different companies.

In 2007, Shane was leading software development for MyDeco, a London-based startup. Shane’s team needed data to develop the MyDeco systems and set about trying to find an appropriate company that they could trust to deliver the data. Frustrated with the lack of high quality software and services available, Shane took the matter into his own hands and he decided to create a framework with his team to build web crawlers to the highest standard.

This turned out well and it didn’t take long to write plugins for the most important websites that they wanted to obtain data from. Continued support for more websites and maintenance of the framework was required so Shane looked to find someone to help.

Meanwhile, after studying computer science and electrical engineering, Pablo graduated in 2007. Soon after graduating, he set up Insophia, a Python development outsourcing company in Montevideo, Uruguay. One of Pablo’s main clients recommended Insophia to MyDeco and it wasn’t long before he was running the MyDeco web scraping team and helping to develop the data processing architecture.

Pablo could see massive potential in the code and asked Shane six months later if they could open source this web crawler. Cleverly combining the words ‘Scrape’ and ‘Python’, the Scrapy project was born. After Pablo spent many months refining it and releasing updates, Scrapy became quite popular and word reached his ears that several high-profile companies, including a social media giant, were using their technology!

In 2010, after developing a fantastic working relationship, Shane and Pablo could see there was an opportunity to start a company that could really make a difference by providing web crawling services, while continuing to advance open source projects. The two men decided to go into business together with one goal: To make it easier to get structured data from the internet.

With a tight and knowledgeable core group of developers and a huge drive to provide the most functional and efficient web crawlers, Shane and Pablo formed Scrapinghub. They started off as just a handful of hard-working programmers; by 2011 there were 10 employees, by 2012 there were 20 and staff numbers continued to double each year, up to the present where there are now almost 100 employees – or Scrapinghubbers as they affectionately call themselves – globally dedicated to developing the best web crawling and data processing solutions.

In 2014 alone, the company scraped and stored data from more than 10 billion pages (more than five times the amount the company did in 2013) and an extra 5 billion have passed through Crawlera. 2015 is already off to a great start with the release of two new open source projects: ScrapyRT and Skinfer, a tool for inferring JSON schemas. The icing on the cake is Scrapinghub’s February announcement of its participation in the DARPA project Memex. It’s testament to the knowledge and experience of a very dedicated team working together all around the world. From small beginnings come great things and it’s clear that for Scrapinghub, a very bright future awaits.

Be the first to know. Gain insights. Make better decisions.

Use web data to do all this and more. We’ve been crawling the web since 2010 and can provide you with web data as a service.

Tell me more

3 thoughts on “The History of Scrapinghub

  1. Boy, do I remember those days!. I spent countless hours with Daniel Graña to strip Shane’s code out of the complex Mydeco system and turn into into separate standalone library. Being my side project, and something that I would do on my spare time from Mydeco, this work took more time than writing the original code itself, but it gained a few important features along the way that were necessary to sustain the growth that Scrapy was soon to experience. Looking back at this, it was one of the best professional decisions I ever made, but it was all ultimately driven by love – love to build & create something useful.

Leave a Reply

Your email address will not be published. Required fields are marked *