Browsed by
Author: Attila Tóth

Scrapy & AutoExtract API integration

We’ve just released a new open-source Scrapy middleware which makes it easy to integrate AutoExtract into your existing Scrapy spider. If you haven’t heard about AutoExtract yet, it’s an AI-based web scraping tool which automatically extracts data from web pages without the need to write any code. Learn more about AutoExtract here.

Web Scraping Questions & Answers Part I

As you know we held the first ever Web Data Extraction Summit last month. During the talks, we had a lot of questions from the audience. We have divided the questions into two parts - in the first part, we will cover questions on Web Scraping at Scale - Proxy and Anti-Ban Best Practice, and Legal Compliance, GDPR in the World of Web Scraping. Enjoy! You can also check out the full talks on...

Price intelligence with Python: Scrapy, SQL and Pandas

In this article I will guide you through a web scraping and data visualization project. We will extract e-commerce data from real e-commerce websites then try to get some insights out of it. The goal of this article is to show you how to get product pricing data from the web and what are some ways to analyze pricing data. We will also look at how price intelligence makes a real difference for...

The Web Data Extraction Summit 2019

The Web Data Extraction Summit was held last week, on 17th September, in Dublin, Ireland. This was the first-ever event dedicated to web scraping and data extraction. We had over 140 curious attendees, 16 great speakers from technical deep dives to business use cases, 12 amazing presentations, a customer panel discussion and unlimited Guinness.

News Data Extraction at Scale with AI powered AutoExtract

A huge portion of the internet is news. It’s a very important type of content because there are always things happening either in our local area or globally that we want to know about. The amount of news published everyday on different sites is ridiculous. Sometimes it’s good news and sometimes it’s bad news but one thing’s for sure: it’s humanly impossible to read all of it everyday.

Learn how to configure and utilize proxies with Python Requests module

Sending HTTP requests in Python is not necessarily easy. We have built-in modules like urllib, urllib2 to deal with HTTP requests. Also, we have third-party tools like Requests. Many developers use Requests because it is high level and designed to make it extremely easy to send HTTP requests.

How to set up a custom proxy in Scrapy?

When scraping the web at a reasonable scale, you can come across a series of problems and challenges. You may want to access a website from a specific country/region. Or maybe you want to work around anti-bot solutions. Whatever the case, to overcome these obstacles you need to use and manage proxies. In this article, I'm going to cover how to set up a custom proxy inside your Scrapy spider in...

Web Data Analysis: Exposing NFL Player Salaries With Python

Football. From throwing a pigskin with your dad, to crunching numbers to determine the probability of your favorite team winning the Super Bowl, it is a sport that's easy to grasp yet teeming with complexity. From game to game, the amount of complex data associated with every team - and every player - increases, creating a more descriptive, timely image of the League at hand.

By using web...