Browsed by
Author: Elias Dorneles

How to Run Python Scripts in Scrapy Cloud

How to Run Python Scripts in Scrapy Cloud

You can deploy, run, and maintain control over your Scrapy spiders in Scrapy Cloud, our production environment. Keeping control means you need to be able to know what’s going on with your spiders and to find out early if they are in trouble. This is one of the reasons why being able to run any Python script in Scrapy Cloud is a nice feature. You can customize to your heart’s content and automate any crawling-related tasks that you may need…

Read More Read More

Meet Parsel: the Selector Library behind Scrapy

Meet Parsel: the Selector Library behind Scrapy

We eat our own spider food since Scrapy is our go-to workhorse on a daily basis. However, there are certain situations where Scrapy can be overkill and that’s when we use Parsel. Parsel is a Python library for extracting data from XML/HTML text using CSS or XPath selectors. It powers the scraping API of the Scrapy framework. Not to be confused with Parseltongue/Parselmouth We extracted Parsel from Scrapy during Europython 2015 as a part of porting Scrapy to Python 3….

Read More Read More

Skinfer: A Tool for Inferring JSON Schemas

Skinfer: A Tool for Inferring JSON Schemas

Imagine that you have a lot of samples for a certain kind of data in JSON format. Maybe you want to have a better feel of it, know which fields appear in all records, which appear only in some and what are their types. In other words, you want to know the schema for the data that you have. We’d like to present you skinfer, a tool that we built for inferring the schema from samples in JSON format. Skinfer…

Read More Read More

XPath Tips from the Web Scraping Trenches

XPath Tips from the Web Scraping Trenches

In the context of web scraping, XPath is a nice tool to have in your belt, as it allows you to write specifications of document locations more flexibly than CSS selectors. In case you’re looking for a tutorial, here is a XPath tutorial with nice examples. In this post, we’ll show you some tips we found valuable when using XPath in the trenches, using Scrapy Selector API for our examples. Avoid using contains(.//text(), ‘search text’) in your XPath conditions. Use…

Read More Read More