Browsed by
Author: Paul Tremberth

This Month in Open Source at Scrapinghub August 2016

This Month in Open Source at Scrapinghub August 2016

Welcome to This Month in Open Source at Scrapinghub! In this regular column, we share all the latest updates on our open source projects including Scrapy, Splash, Portia, and Frontera. If you’re interested in learning more or even becoming a contributor, reach out to us by emailing opensource@scrapinghub.com or on Twitter @scrapinghub. Scrapy This past May, Scrapy 1.1 (with Python 3 support) was a big milestone for our Python web scraping community. And 2 weeks ago, Scrapy reached 15k stars…

Read More Read More

This Month in Open Source at Scrapinghub June 2016

This Month in Open Source at Scrapinghub June 2016

Welcome to This Month in Open Source at Scrapinghub! In this regular column, we share all the latest updates on our open source projects including Scrapy, Splash, Portia, and Frontera. If you’re interested in learning more or even becoming a contributor, reach out to us by email at opensource@scrapinghub.com or on Twitter @scrapinghub Scrapy 1.1 For those who missed the big news, Scrapy 1.1 is live! It’s the first official release that comes with Python 3 support, so you can…

Read More Read More

Data Extraction with Scrapy and Python 3

Data Extraction with Scrapy and Python 3

Scrapy 1.1 Release with Official Python 3 Support Fasten your seat belts, ladies and gentlemen: Scrapy 1.1 with Python 3 support is officially out! After a couple months of hard work and four release candidates, this is the first official Scrapy release to support Python 3. We know that many of you have been eagerly looking forward to moving your whole stack to Python 3. Well, wait no more, you can get rid of Python 2 once and for all…

Read More Read More

This Month in Open Source at Scrapinghub March 2016

This Month in Open Source at Scrapinghub March 2016

Welcome to This Month in Open Source at Scrapinghub! In this monthly column, we share all the latest updates on our open source projects including Scrapy, Splash, Portia, and Frontera. If you’re interested in learning more or even becoming a contributor, reach out to us by email at opensource [@] scrapinghub.com or on Twitter @scrapinghub. Scrapy The big news for Scrapy lately is that Python 3 is now supported for the majority of use cases, the exceptions being FTP and…

Read More Read More

Extracting schema.org Microdata Using Scrapy Selectors and XPath

Extracting schema.org Microdata Using Scrapy Selectors and XPath

EDIT: 2015-10-30 We have released an lxml-based version of this code as an open source library called extruct. Source code is on Github, and the package is available on PyPI. Enjoy! Web pages are full of data, that is what web scraping is mostly about. But often you want more than data, you want meaning. Microdata markup embedded in HTML source helps machines understand what the pages are about: contact information, product reviews, events etc. Web authors have several ways to…

Read More Read More