Browsed by
Category: data-extraction

A PRACTICAL GUIDE TO WEB DATA QA PART IV: COMPLEMENTING SEMI-AUTOMATED TECHNIQUES

If you haven’t read the previous ones, here’s the first part, the second and third part of the series.

Real Estate: Use Web Data Extraction to Make Smarter Decisions

As the internet continues to grow, the amount of data it generates grows with it, opening new opportunities to improve processes and make more informed decisions. Real estate is one of the many industries that are being disrupted by data-related technologies and innovations. Whether you are a broker, realtor, investor, or property manager you have the potential to become data-driven and gain...

Scrapy Cloud Secrets: Hub Crawl Frontier and How To Use It

Imagine a long crawling process, like extracting data from a website for a whole month. We can start it and leave it running until we get the results. Though, we can agree that a whole month is plenty of time for something to go wrong. The target website can go down for a few minutes/hours, there can be some sort of power outage in your crawling server or even some other internet connection...

Web Scraping Basics: A Developer’s Guide To Reliably Extract Data

The web is complex and constantly changing. It is one of the reasons why web data extraction can be difficult, especially in the long term. It’s necessary to understand how a website works really well, before you try to extract data. Luckily, there are lots of inspection and code tools available for this and in this article we will show you some of our favorites.

Extracting clean article HTML with News API

The Internet offers a vast amount of written content in the form of articles, news, blog posts, stories, essays, tutorials that can be leveraged by many useful applications:

Job Postings Beta API: Extract Job Postings at Scale

We’re excited to announce our newest data extraction API, Job Postings API. From now on, you can use AutoExtract to extract Job Postings data from many job boards and recruitment sites. Without writing any custom data extraction code!

News Web Data Extraction to Predict Irish Election Results

Can pre-election news coverage of political parties predict the trend of the elections?

On February 9th, 2020, Ireland elected a new parliament. Prior to the elections, the political parties invested a lot of time, money and energy to get their political message to the people. A lot of research goes into selecting the right platform and the right medium.

Looking Back at 2019

2019 was an exciting year for Scrapinghub. We created things we have never created before and did things nobody in our industry had ever done before. Let’s revisit what happened in 2019!

How to use a proxy in Puppeteer

Puppeteer is a high-level API for headless chrome. It’s one of the most popular tools to use for web automation or web scraping in Node.js. In web scraping, many developers use it to handle javascript rendering and web data extraction. In this article, we are going to cover how to set up a proxy in Puppeteer and what your options are if you want to rotate proxies.

...

Building Blocks of an Unstoppable Web Scraping Infrastructure

More and more businesses leverage the power of web scraping. Extracting data from the web is becoming popular. But it doesn't mean that the technical challenges are gone. Building a sustainable web scraping infrastructure takes expertise and experience. Here, at Scrapinghub we scrape 9 billion pages per month. In this article, we are going to summarize what the essential elements of web...