Browsed by
Author: Elias Dorneles

How to Run Python Scripts in Scrapy Cloud

You can deploy, run, and maintain control over your Scrapy spiders in Scrapy Cloud, our production environment. Keeping control means you need to be able to know what’s going on with your spiders and to find out early if they are in trouble.

Meet Parsel: the Selector Library behind Scrapy

We eat our own spider food since Scrapy is our go-to workhorse on a daily basis. However, there are certain situations where Scrapy can be overkill and that’s when we use Parsel. Parsel is a Python library for extracting data from XML/HTML text using CSS or XPath selectors. It powers the scraping API of the Scrapy framework.

Skinfer: A Tool for Inferring JSON Schemas

Imagine that you have a lot of samples for a certain kind of data in JSON format. Maybe you want to have a better feel of it, know which fields appear in all records, which appear only in some and what are their types. In other words, you want to know the schema for the data that you have.

XPath Tips from the Web Scraping Trenches

In the context of web scraping, XPath is a nice tool to have in your belt, as it allows you to write specifications of document locations more flexibly than CSS selectors. In case you're looking for a tutorial, here is a XPath tutorial with nice examples.

Be the first to know. Gain insights. Make better decisions.

Use web data to do all this and more. We’ve been crawling the web since 2010 and can provide you with web data as a service.

Tell me more

Welcome

Here we blog about all things related to web scraping and web data.

If you want to learn more about how you can use web data in your company, check out our Data as a Services page for inspiration.

Learn More

Recent Posts