We eat our own spider food since Scrapy is our go-to workhorse on a daily basis. However, there are certain situations where Scrapy can be overkill and that’s when we use Parsel. Parsel is a Python library for extracting data from XML/HTML text using CSS or XPath selectors. It powers the scraping API of the Scrapy framework.
Welcome to Scrapy Tips from the Pros! In this monthly column, we share a few tricks and hacks to help speed up your web scraping activities. As the lead Scrapy maintainers, we’ve run into every obstacle you can imagine so don’t worry, you’re in great hands. Feel free to reach out to us on Twitter or Facebook with any suggestions for future topics.
Many governments worldwide have laws enforcing them to publish their expenses, contracts, decisions, and so forth, on the web. This is so the general public can monitor what their representatives are doing on their behalf.
Note: Portia is no longer available for new users. It has been disabled for all the new organisations from August 20, 2018 onward.
Unlike Portia labiata, the hunting spider that feeds on other spiders, our Portia feeds on data. Considered the Einstein in the spider world, we modeled our own creation after the intelligence and visual abilities of its arachnid namesake.