We eat our own spider food since Scrapy is our go-to workhorse on a daily basis. However, there are certain situations where Scrapy can be overkill and that’s when we use Parsel. Parsel is a Python library for extracting data from XML/HTML text using CSS or XPath selectors. It powers the scraping API of the Scrapy framework.
Welcome to Scrapy Tips from the Pros! In this monthly column, we share a few tricks and hacks to help speed up your web scraping activities. As the lead Scrapy maintainers, we’ve run into every obstacle you can imagine so don’t worry, you’re in great hands. Feel free to reach out to us on Twitter or Facebook with any suggestions for future topics.
Many governments worldwide have laws enforcing them to publish their expenses, contracts, decisions, and so forth, on the web. This is so the general public can monitor what their representatives are doing on their behalf.