Large scale web scraping

From inconsistent website layouts that break our extraction logic to badly written HTML, web scraping comes with its share of difficulties. Over the last few years, the single most important challenge in web scraping has been to actually get to the data - and not get blocked. This is due to the antibots or the underlying technologies that websites use to protect their data. Proxies are a major...

A Practical Guide to Web Data QA Part V: Broad Crawls

If you haven’t read the previous ones, here’s the first part, second part, third part and fourth part of the series.

Announcing The Web Data Extraction Summit 2020

Web data extraction has become one of the most important tools for businesses to grow and stay ahead of the competition. From developing better pricing strategies to identifying hidden risks and building better products, web data extraction provides the power to transform infinite web data into a structured format that can help you make profitable decisions.

News & Article Data Extraction: Open Source vs Closed Source Solutions

Article extraction is the process of extracting data fields from an article page and putting it into a machine-readable structured format like JSON. In many use cases, the article page that you want to extract is a news page but it can be any other type of article. Based on our experience in the web data extraction industry for over 10 years, the demand for structured article data is getting...


If you haven’t read the previous ones, here’s the first part, the second and third part of the series.

Real Estate: Use Web Data Extraction to Make Smarter Decisions

As the internet continues to grow, the amount of data it generates grows with it, opening new opportunities to improve processes and make more informed decisions. Real estate is one of the many industries that are being disrupted by data-related technologies and innovations. Whether you are a broker, realtor, investor, or property manager you have the potential to become data-driven and gain...

Scrapy Cloud Secrets: Hub Crawl Frontier and How To Use It

Imagine a long crawling process, like extracting data from a website for a whole month. We can start it and leave it running until we get the results. Though, we can agree that a whole month is plenty of time for something to go wrong. The target website can go down for a few minutes/hours, there can be some sort of power outage in your crawling server or even some other internet connection...

Blog Comments API (BETA): Extract Blog Comment DATA At Scale

A reliable and scalable way to tap into blog comment  driven insights

We are excited to announce our newest data extraction API. The Blog Comments API is now publicly available as a BETA release.

Your Price Intelligence Questions Answered

What is Price Intelligence?

Price Intelligence is leveraging web data to make better pricing, marketing, and business decisions. Basically, it is all about making use of the available data to optimize your pricing strategy, making it more competitive, increasing profitability, and ultimately, improving your business performance.

Data Center Proxies vs. Residential Proxies

In this blog post you are going to learn what’s the main difference between data center proxies and residential proxies. When to use data center and residential proxies in your web data extraction project to maximize successful requests