The Web Data Extraction Summit was held last week, on 17th September, in Dublin, Ireland. This was the first-ever event dedicated to web scraping and data extraction. We had over 140 curious attendees, 16 great speakers from technical deep dives to business use cases, 12 amazing presentations, a customer panel discussion and unlimited Guinness.
A huge portion of the internet is news. It’s a very important type of content because there are always things happening either in our local area or globally that we want to know about. The amount of news published everyday on different sites is ridiculous. Sometimes it’s good news and sometimes it’s bad news but one thing’s for sure: it’s humanly impossible to read all of it everyday.
Product data - whether from e-commerce sites, auto listings or product reviews, offers a treasure trove of insights that can give your business an immense competitive edge in your market. Getting access to this data in a structured format can unleash new potential for not only business intelligence teams, but also their counterparts in marketing, sales, and management that rely on accurate...
Your spider is developed and we are getting our structured data daily, so our job is done, right?
Absolutely not! Website changes (sometimes very subtly), anti-bot countermeasures and temporary problems often reduce the quality and reliability of our data.
One negative review can cost your business up to 22% of its prospects. This was one of the sobering findings in a study highlighted on Moz last year. With over half of shoppers rating reviews as important in their buying decision, no company large or small can afford to ignore stats like these - let alone the reviews themselves. In what follows I'll let you in on how web scraping can help you...
XPath is a powerful language that is often used for scraping the web. It allows you to select nodes or compute values from an XML or HTML document and is actually one of the languages that you can use to extract web data using Scrapy. The other is CSS and while CSS selectors are a popular choice, XPath can actually allow you to do more.
Welcome to Scrapy Tips from the Pros! In this monthly column, we share a few tricks and hacks to help speed up your web scraping activities. As the lead Scrapy maintainers, we’ve run into every obstacle you can imagine so don’t worry, you’re in great hands. Feel free to reach out to us on Twitter or Facebook with any suggestions for future topics.