How to set up a custom proxy in Scrapy?

When scraping the web at a reasonable scale, you can come across a series of problems and challenges. You may want to access a website from a specific country/region. Or maybe you want to work around anti-bot solutions. Whatever the case, to overcome these obstacles you need to use and manage proxies. In this article, I'm going to cover how to set up a custom proxy inside your Scrapy spider in...

GDPR Update: Scraping Public Personal Data

One common misconception about scraping personal data is that public personal data does not fall under the GDPR. Many businesses assume that because the data has already been made public on another website that it is fair game to scrape. In actuality, GDPR makes no blanket exceptions for public personal data and the same analysis for any other personal data must be conducted prior to scraping...

Solution Architecture Part 5: Designing A Well-Optimised Web Scraping Solution

In the fifth and final post of this solution architecture series, we will share with you how we architect a web scraping solution, all the core components of a well-optimized solution, and the resources required to execute it.

Solution Architecture Part 4: Accessing The Technical Feasibility of Your Web Scraping Project

In the fourth post of this solution architecture series, we will share with you our step-by-step process for evaluating the technical feasibility of a web scraping project.

Visual Web Scraping Tools: What to Do When They Are No Longer Fit For Purpose?

Visual web scraping tools are great. They allow people with little to no technical know-how to extract data from websites with only a couple hours of upskilling, making them great for simple lead generation, market intelligence and competitor monitoring projects. Removing countless hours of manual entry work for sales and marketing teams, researchers, and business intelligence team in the...

Solution Architecture Part 3: Conducting a Web Scraping Legal Review

In this the third post in our solution architecture series, we will share with you our step-by-step process for conducting a legal review of every web scraping project we work on.

ScrapyRT: Turn Websites Into Real-Time APIs

If you’ve been using Scrapy for any period of time, you know the capabilities a well-designed Scrapy spider can give you.

Web Data Analysis: Exposing NFL Player Salaries With Python

Football. From throwing a pigskin with your dad, to crunching numbers to determine the probability of your favorite team winning the Super Bowl, it is a sport that's easy to grasp yet teeming with complexity. From game to game, the amount of complex data associated with every team - and every player - increases, creating a more descriptive, timely image of the League at hand.

By using web...

From The Creators Of Scrapy: Artificial Intelligence Data Extraction API

To accurately extract data from a web page, developers usually need to develop custom code for each website. This is manageable and recommended for tens or hundreds of websites and where data quality is of the utmost importance, but if you need to extract data from thousands of sites, or rapidly extract data from sites that are not yet covered by pre-existing code, this is often an...

Scrapinghub’s New AI Powered Developer Data Extraction API for E-Commerce & Article Extraction

Today, we’re delighted to announce the launch of the beta program for Scrapinghub’s new AI powered developer data extraction API for automated product and article extraction.

After much development and refinement with alpha users, our team have refined this machine learning technology to the point that data extraction engine is capable of automatically identifying common items on product and...

Sign up now

Be the first to know. Gain insights. Make better decisions.

Use web data to do all this and more. We’ve been crawling the web since 2010 and can provide you with web data as a service.

Tell me more

Welcome

Here we blog about all things related to web scraping and web data.

If you want to learn more about how you can use web data in your company, check out our Data as a Services page for inspiration.

Follow Us

Learn More

Recent Posts