In just the US alone, there were 27 million individuals running or starting a new business in 2015. With this fiercely competitive startup scene, business owners need to take advantage of every resource available, especially given a high probability of failure. Enter web data. Web data is abundant and those who harness it can do everything from keeping an eye on competitors to ensuring customer...
XPath is a powerful language that is often used for scraping the web. It allows you to select nodes or compute values from an XML or HTML document and is actually one of the languages that you can use to extract web data using Scrapy. The other is CSS and while CSS selectors are a popular choice, XPath can actually allow you to do more.
During the 2016 Collision Conference held in New Orleans, our Content Strategist Cecilia Haynes interviewed conference speaker Dr. Tyrone Grandison. At the time of the interview, he was the Deputy Chief Data Officer at the U.S. Department of Commerce. Tyrone is currently the Chief Information Officer for the Institute for Health Metrics and Evaluation.
During the 2016 Collision Conference held in New Orleans, Scrapinghub Content Strategist Cecilia Haynes had the opportunity to interview the brains and the brawn behind Up Hail, the rideshare comparison app.
What does “the Future of Work” mean to you? To us, it describes how we approach life at Scrapinghub. We don't work in a traditional office (we're 100% distributed) and we allow folks the freedom to make their own schedules (you know when you work best). By finding ways to break away from the traditional 9-to-5 mode, we ended up creating a framework for the Future of Work.
Python is our go-to language of choice and Python 2 is losing traction. In order to survive, older programs need to be Python 3 compatible.
The first rule of web crawling is you do not harm the website. The second rule of web crawling is you do NOT harm the website. We’re supporters of the democratization of web data, but not at the expense of the website’s owners.
It’s the end of an era. Python 2 is on its way out with only a few security and bug fixes forthcoming from now until its official retirement in 2020. Given this withdrawal of support and the fact that Python 3 has snazzier features, we are thrilled to announce that Scrapy Cloud now officially supports Python 3.