Browsed by
Category: autoextract

Extracting Article & News Data: The Importance of Data Quality

Article and news data extraction is becoming increasingly popular and widely used by companies. Data quality plays a vital role in making sure these projects succeed. If the quality of the extracted articles is not good enough, your whole business could be at risk, especially if it depends on the constant flow of high quality article data.

Product Reviews API (Beta): Extract Product Reviews at Scale

We are excited to announce our next AutoExtract API: Product Reviews API (Beta). Using this API, you can get access to product reviews in a structured format, without writing site-specific code. You can use the Product Reviews API to extract product reviews from eCommerce sites at scale. Just make a request to the API and receive your data in real-time!

Custom crawling & News API: designing a web scraping solution

Web scraping projects usually involve data extraction from many websites. The standard approach to tackle this problem is to write some code to navigate and extract the data from each website. However, this approach may not scale so nicely in the long-term, requiring maintenance effort for each website; it also doesn’t scale in the short-term, when we need to start the extraction process in a...

Extracting clean article HTML with News API

The Internet offers a vast amount of written content in the form of articles, news, blog posts, stories, essays, tutorials that can be leveraged by many useful applications:

Job Postings Beta API: Extract Job Postings at Scale

We’re excited to announce our newest data extraction API, Job Postings API. From now on, you can use AutoExtract to extract Job Postings data from many job boards and recruitment sites. Without writing any custom data extraction code!

News Web Data Extraction to Predict Irish Election Results

Can pre-election news coverage of political parties predict the trend of the elections?

On February 9th, 2020, Ireland elected a new parliament. Prior to the elections, the political parties invested a lot of time, money and energy to get their political message to the people. A lot of research goes into selecting the right platform and the right medium.

Looking Back at 2019

2019 was an exciting year for Scrapinghub. We created things we have never created before and did things nobody in our industry had ever done before. Let’s revisit what happened in 2019!

Scrapy & AutoExtract API integration

We’ve just released a new open-source Scrapy middleware which makes it easy to integrate AutoExtract into your existing Scrapy spider. If you haven’t heard about AutoExtract yet, it’s an AI-based web scraping tool which automatically extracts data from web pages without the need to write any code. Learn more about AutoExtract here.

News Data Extraction at Scale with AI powered AutoExtract

A huge portion of the internet is news. It’s a very important type of content because there are always things happening either in our local area or globally that we want to know about. The amount of news published everyday on different sites is ridiculous. Sometimes it’s good news and sometimes it’s bad news but one thing’s for sure: it’s humanly impossible to read all of it everyday.