What I Learned as a Google Summer of Code student at Scrapinghub

Google Summer of Code (GSoC) was such a great experience for students like me. I learned so much about open source communities as well as contributing to their complex projects. I also learned a great deal from my mentors, Konstantin and Cathal, about programming and software engineering practices. In my opinion, the most valuable lesson I got from GSoC was what it was like to be a Software...

GDPR Compliance For Web Scrapers: The Step-By-Step Guide

Unless you’ve been living under a rock for the past few months you know that the EU’s General Data Protection Regulation (GDPR) is upon us.

It is the most comprehensive data protection law ever been introduced, fundamentally changing the way companies can use the personal data of their customers and prospects.

There are countless articles and guides about how GDPR will affect your company’s...

For E-Commerce Data Scientists: Lessons Learned Scraping 100 Billion Products Pages

Web scraping can look deceptively easy these days. There are numerous open-source libraries/frameworks, visual scraping tools and data extraction tools that make it very easy to scrape data from a website. However, when you want to scrape websites at scale things start to get very tricky, very fast.

A Sneak Peek Inside What Hedge Funds Think of Alternative Financial Data

Unbeknownst to many, there is a data revolution happening in finance.

Want to Predict Fitbit’s Quarterly Revenue? Eagle Alpha Did It Using Web Scraped Product Data

Throughout the history of the financial markets information has been power. The trader with access to the most accurate information can quickly gain an edge over the market.

How Data Compliance Companies Are Turning To Web Crawlers To Take Advantage of the GDPR Business Opportunity

Over the last couple weeks, GDPR has brought data protection center stage. What was once a fringe concern for most businesses overnight became a burning problem that needed to be solved immediately.

Looking Back at 2017

It’s been another standout year for Scrapinghub and the scraping community at large. Together we crawled 79.1 billion pages (nearly double 2016), with over 103 billion scraped records; what a year!

A Faster, Updated Scrapinghub

We’re very excited to announce a new look for Scrapinghub!

Scraping the Steam Game Store with Scrapy

This is a guest post from the folks over at Intoli, one of the awesome companies providing Scrapy commercial support and longtime Scrapy fans.

Do Androids Dream of Electric Sheep?

It got very easy to do Machine Learning: you install a ML library like scikit-learn or xgboost, choose an estimator, feed it some training data, and get a model which can be used for predictions.

Be the first to know. Gain insights. Make better decisions.

Use web data to do all this and more. We’ve been crawling the web since 2010 and can provide you with web data as a service.

Tell me more

Welcome

Here we blog about all things related to web scraping and web data.

If you want to learn more about how you can use web data in your company, check out our Data as a Services page for inspiration.

Learn More

Recent Posts