Web Scraping Basics: A Developer’s Guide To Reliably Extract Data

The web is complex and constantly changing. It is one of the reasons why web data extraction can be difficult, especially in the long term. It’s necessary to understand how a website works really well, before you try to extract data. Luckily, there are lots of inspection and code tools available for this and in this article we will show you some of our favorites.

Extracting Article & News Data: The Importance of Data Quality

Article and news data extraction is becoming increasingly popular and widely used by companies. Data quality plays a vital role in making sure these projects succeed. If the quality of the extracted articles is not good enough, your whole business could be at risk, especially if it depends on the constant flow of high quality article data.

Price Gouging or Economics at Work: Price Intelligence to Track Consumer Sentiment

As the COVID-19 pandemic took hold, we at Scrapinghub began to wonder how it would impact on the data we crawl, and whether that data could tell us something useful about the pandemic and its impact.

A Practical Guide to Web Data QA Part III: Holistic Data Validation Techniques

In case you missed them, here’s the first part and second part of the series.

Product Reviews API (Beta): Extract Product Reviews at Scale

We are excited to announce our next AutoExtract API: Product Reviews API (Beta). Using this API, you can get access to product reviews in a structured format, without writing site-specific code. You can use the Product Reviews API to extract product reviews from eCommerce sites at scale. Just make a request to the API and receive your data in real-time!

Custom crawling & News API: designing a web scraping solution

Web scraping projects usually involve data extraction from many websites. The standard approach to tackle this problem is to write some code to navigate and extract the data from each website. However, this approach may not scale so nicely in the long-term, requiring maintenance effort for each website; it also doesn’t scale in the short-term, when we need to start the extraction process in a...

Vehicle API (Beta): Extract Automotive Data at Scale

Today we are delighted to launch a Beta of our newest data extraction API: AutoExtract Vehicle API. With this API you can collect structured data from web pages that contain automotive data such as classified or dealership sites. Using our API, you can get your data without writing site-specific code. If you need automotive/vehicle data, sign up now for a beta version of our Vehicle API.

A Practical Guide to Web Data Extraction QA Part II: Common validation pitfalls

In case you missed the first part of this series, where we went through data validation techniques, you can read it now: A Practical Guide To Web Data  Extraction QA Part I: Validation Techniques

Transitioning to Remote Working as a Company

I’d like to echo Joel Gasgoine’s sentiments: This is not normal remote working!

Like Buffer, we’ve been a remote-first company for almost 10 years and we’re also adjusting to the new normal as a result of COVID-19.

A Practical Guide to Web Data QA Part I: Validation Techniques

When it comes to web scraping at scale, there’s a set of challenges you need to overcome to extract the data. But once you are able to get it, you still have work to do. You need to have a data QA process in place. Data quality becomes especially crucial if you’re extracting high volumes of data from the web regularly and your team’s success depends on the quality of the scraped data.