Browsed by
Category: web-scraping

How to use a proxy in Puppeteer

Puppeteer is a high-level API for headless chrome. It’s one of the most popular tools to use for web automation or web scraping in Node.js. In web scraping, many developers use it to handle javascript rendering and web data extraction. In this article, we are going to cover how to set up a proxy in Puppeteer and what your options are if you want to rotate proxies.

Building Blocks of an Unstoppable Web Scraping Infrastructure

More and more businesses leverage the power of web scraping. Extracting data from the web is becoming popular. But it doesn't mean that the technical challenges are gone. Building a sustainable web scraping infrastructure takes expertise and experience. Here, at Scrapinghub we scrape 9 billion pages per month. In this article, we are going to summarize what the essential elements of web...

Backconnect Proxy: Explanation & Comparison To Other Proxies

Scaling up your web scraping project is not an easy task. Adding proxies is one of the first actions you will need to take. You will need to manage a healthy proxy pool to avoid bans. There are a lot of proxy services/providers, each having a whole host of different types of proxies. In this blog post, you are going to learn how backconnect proxies work and when you should use them.

Web Scraping Questions & Answers Part I

As you know we held the first ever Web Data Extraction Summit last month. During the talks, we had a lot of questions from the audience. We have divided the questions into two parts - in the first part, we will cover questions on Web Scraping at Scale - Proxy and Anti-Ban Best Practice, and Legal Compliance, GDPR in the World of Web Scraping. Enjoy! You can also check out the full talks on...

The Web Data Extraction Summit 2019

The Web Data Extraction Summit was held last week, on 17th September, in Dublin, Ireland. This was the first-ever event dedicated to web scraping and data extraction. We had over 140 curious attendees, 16 great speakers from technical deep dives to business use cases, 12 amazing presentations, a customer panel discussion and unlimited Guinness.

GDPR Update: Scraping Public Personal Data

One common misconception about scraping personal data is that public personal data does not fall under the GDPR. Many businesses assume that because the data has already been made public on another website that it is fair game to scrape. In actuality, GDPR makes no blanket exceptions for public personal data and the same analysis for any other personal data must be conducted prior to scraping...

Solution Architecture Part 5: Designing A Well-Optimised Web Scraping Solution

In the fifth and final post of this solution architecture series, we will share with you how we architect a web scraping solution, all the core components of a well-optimized solution, and the resources required to execute it.

Solution Architecture Part 4: Accessing The Technical Feasibility of Your Web Scraping Project

In the fourth post of this solution architecture series, we will share with you our step-by-step process for evaluating the technical feasibility of a web scraping project.

Visual Web Scraping Tools: What to Do When They Are No Longer Fit For Purpose?

Visual web scraping tools are great. They allow people with little to no technical know-how to extract data from websites with only a couple hours of upskilling, making them great for simple lead generation, market intelligence and competitor monitoring projects. Removing countless hours of manual entry work for sales and marketing teams, researchers, and business intelligence team in the...

Solution Architecture Part 3: Conducting a Web Scraping Legal Review

In this the third post in our solution architecture series, we will share with you our step-by-step process for conducting a legal review of every web scraping project we work on.