Scrapinghub’s New AI Powered Developer Data Extraction API for E-Commerce & Article Extraction

Today, we’re delighted to announce the launch of the beta program for Scrapinghub’s new AI powered developer data extraction API for automated product and article extraction.

After much development and refinement with alpha users, our team have refined this machine learning technology to the point that data extraction engine is capable of automatically identifying common items on product and...

Solution Architecture Part 2: How to Define The Scope of Your Web Scraping Project

In this the second post in our solution architecture series, we will share with you our step-by-step process for data extraction requirement gathering.

How to Architect a Web Scraping Solution: The Step-by-Step Guide

For many people (especially non-techies), trying to architect a web scraping solution for their needs and estimate the resources required to develop it, can be a tricky process.

Oftentimes, this is their first web scraping project and as a result have little reference experience to draw upon when investigating the feasibility of a data extraction project.

In this series of articles we’re going...

Navigating Compliance When Extracting Web Scraped Alternative Financial Data

When it comes to using web data as alternative data for investment decision making, one topic rules them all: compliance.

Regulatory compliance is such a pervasive issue in alternative data for finance, that it is often the number one barrier to investment firms using web data in their decision making processes. And matters aren’t helped by the regulatory ambiguity.

In this article, we’re...

St Patrick’s Day Special: Finding Dublin’s Best Pint of Guinness With Web Scraping

St Patrick’s Day Special: Finding Dublin’s Best Pint of Guinness With Web Scraping

At Scrapinghub we are known for our ability to help companies make mission critical business decisions through the use of web scraped data.

But for anyone who enjoys a freshly poured pint of stout, there is one mission critical question that creates a debate like no other…

“Who serves the best pint of Guinness?”

Spidermon: Scrapinghub’s Secret Sauce To Our Data Quality & Reliability Guarantee

If you know anything about Scrapinghub, you know that we are obsessed with data quality and data reliability.

Outside of building some of the most powerful web scraping tools in the world, we also specialise in helping companies extract the data they need for their mission-critical business requirements. Most notably companies who:

  • Rely on web data to make critical business decisions, or;
  • ...

Meet Spidermon: Scrapinghub’s Battle Tested Spider Monitoring Library [Now Open Sourced]

Your spider is developed and we are getting our structured data daily, so our job is done, right?

Absolutely not! Website changes (sometimes very subtly), anti-bot countermeasures and temporary problems often reduce the quality and reliability of our data.

Proxy Management: Should I Build My Proxy Infrastructure In-House Or Use AN Off-The-Shelf Proxy Solution?

Proxy management is the thorn in the side of most web scrapers. Without a robust and fully featured proxy infrastructure, you will often experience constant reliability issues and hours spent putting out proxy fires.

A situation no web scraping professional wants to deal with. Us web scrapers are interested in extracting and using web data, not managing proxies.

In this article, we’re going to...

A Sneak Peek Inside Crawlera: The World’s Smartest Web Scraping Proxy Network

“How does Scrapinghub Crawlera work?” is the most common question we get asked from customers who after struggling for months (or years) with constant proxy issues, only to have them disappear completely when they switch to Crawlera. 

Today we’re going to give you a behind the scenes look at Crawlera so you can see for yourself why it is the world’s smartest web scraping proxy network and the...

Why We Created Crawlera? The World’s Smartest Web Scraping Proxy Network

Let’s face it, managing your proxy pool can be an absolute pain and the biggest bottleneck to the reliability of your web scraping! 

Nothing annoys developers more than crawlers failing because their proxies are continuously being banned.

Blog banner - 1200x250 – 1
Sign up now

Web Data Extraction Summit 2019

presented by Scrapinghub

Dublin, Ireland 
17th September 2019

GET TICKETS

Be the first to know. Gain insights. Make better decisions.

Use web data to do all this and more. We’ve been crawling the web since 2010 and can provide you with web data as a service.

Tell me more

Welcome

Here we blog about all things related to web scraping and web data.

If you want to learn more about how you can use web data in your company, check out our Data as a Services page for inspiration.

Follow Us

Learn More

Recent Posts