Spidermon: Scrapinghub’s Secret Sauce To Our Data Quality & Reliability Guarantee

If you know anything about Scrapinghub, you know that we are obsessed with data quality and data reliability.

Outside of building some of the most powerful web scraping tools in the world, we also specialise in helping companies extract the data they need for their mission-critical business requirements. Most notably companies who:

  • Rely on web data to make critical business decisions, or;
  • ...

Meet Spidermon: Scrapinghub’s Battle Tested Spider Monitoring Library [Now Open Sourced]

Your spider is developed and we are getting our structured data daily, so our job is done, right?

Absolutely not! Website changes (sometimes very subtly), anti-bot countermeasures and temporary problems often reduce the quality and reliability of our data.

Proxy Management: Should I Build My Proxy Infrastructure In-House Or Use A Off-The-Shelf Proxy Solution?

Proxy management is the thorn in the side of most web scrapers. Without a robust and fully featured proxy infrastructure, you will often experience constant reliability issues and hours spent putting out proxy fires.

A situation no web scraping professional wants to deal with. Us web scrapers are interested in extracting and using web data, not managing proxies.

In this article, we’re going to...

A Sneak Peek Inside Crawlera: The World’s Smartest Web Scraping Proxy Network

“How does Scrapinghub Crawlera work?” is the most common question we get asked from customers who after struggling for months (or years) with constant proxy issues, only to have them disappear completely when they switch to Crawlera. 

Today we’re going to give you a behind the scenes look at Crawlera so you can see for yourself why it is the world’s smartest web scraping proxy network and the...

Why We Created Crawlera? The World’s Smartest Web Scraping Proxy Network

Let’s face it, managing your proxy pool can be an absolute pain and the biggest bottleneck to the reliability of your web scraping! 

Nothing annoys developers more than crawlers failing because their proxies are continuously being banned.

The Rise of Web Data in Hedge Fund Decision Making & The Importance of Data Quality

Over the past few years, there has been an explosion in the use of alternative data sources in investment decision making in hedge funds, investment banks and private equity firms.

These new data sources, collectively known as “alternative data”, have the potential to give firms a crucial informational edge in the market, enabling them to generate alpha.

The Predictive Power of Web Scraped Product Data For Institutional Investors: A GoPro Case Study

Investors understand the importance of high-quality information. It minimizes risk, empowers decision-making, and enables investors of all sizes to obtain alpha - like the old adage, knowing is often half the battle.

Knowing this, alternative data providers wield vast, untraditional datasets derived from hundreds of millions of sources, not only enabling asset managers to consistently obtain...

The Challenges E-Commerce Retailers Face Managing Their Web Scraping Proxies

These days web scraping amongst the big e-commerce companies is ubiquitous due to the advantages data-based decision making can bring to remaining competitive in such a tight margin business.

E-commerce companies are increasingly using web data fuel their competitor research, dynamic pricing and new product research.

For these e-commerce sites, their most important consideration is: the ...

Looking Back at 2018

What a year 2018 has been for Scrapinghub!!

It’s hard to know where to start…

This year has seen tremendous growth at Scrapinghub, setting us up to have a great 2019.

Here are some of the highlights of 2018…

Do What is Right Not What is Easy!

I was recently invited to speak at the IAPP Europe Data Protection Congress in Brussels about web scraping and GDPR. The panel also included Claire François of Hunton Andrews Kurth and Peter Brown from the Information Commissioner’s Office (ICO). For more information you can check out my blog about this topic GDPR Compliance for Web Scrapers: The Step-by-Step Guide.