How to Build your own Price Monitoring Tool

How to Build your own Price Monitoring Tool

Computers are great at repetitive tasks. They don’t get distracted, bored, or tired. Automation is how you should be approaching tedious tasks that are absolutely essential to becoming a successful business or when carrying out mundane responsibilities. Price monitoring, for example, is a practice that every company should be doing, and is a task that readily lends itself to automation. In this tutorial, I’ll walk you through how to create your very own price monitoring tool from scratch. While I’m…

Read More Read More

How You Can Use Web Data to Accelerate Your Startup

How You Can Use Web Data to Accelerate Your Startup

In just the US alone, there were 27 million individuals running or starting a new business in 2015. With this fiercely competitive startup scene, business owners need to take advantage of every resource available, especially given a high probability of failure. Enter web data. Web data is abundant and those who harness it can do everything from keeping an eye on competitors to ensuring customer satisfaction. Web Data and Web Scraping You can get web data through a process called…

Read More Read More

An Introduction to XPath: How to Get Started

An Introduction to XPath: How to Get Started

XPath is a powerful language that is often used for scraping the web. It allows you to select nodes or compute values from an XML or HTML document and is actually one of the languages that you can use to extract web data using Scrapy. The other is CSS and while CSS selectors are a popular choice, XPath can actually allow you to do more. With XPath, you can extract data based on text elements’ contents, and not only on…

Read More Read More

Why Promoting Open Data Increases Economic Opportunities

Why Promoting Open Data Increases Economic Opportunities

During the 2016 Collision Conference held in New Orleans, our Content Strategist Cecilia Haynes interviewed conference speaker Dr. Tyrone Grandison. At the time of the interview, he was the Deputy Chief Data Officer at the U.S. Department of Commerce. Tyrone is currently the Chief Information Officer for the Institute for Health Metrics and Evaluation. Dr. Tyrone Grandison Coming fresh off his talk on “Data science, apps and civic responsibility“, Cecilia was thrilled to chat with Tyrone all about the democratization of…

Read More Read More

Interview: How Up Hail uses Scrapy to Increase Transparency

Interview: How Up Hail uses Scrapy to Increase Transparency

During the 2016 Collision Conference held in New Orleans, Scrapinghub Content Strategist Cecilia Haynes had the opportunity to interview the brains and the brawn behind Up Hail, the rideshare comparison app. Avi Wilensky is the Founder of Up Hail Avi sat down with Cecilia and shared how he and his team use Scrapy and web scraping to help users find the best rideshare and taxi deals in real time. Fun fact, Up Hail was named one of Mashable’s 11 Most…

Read More Read More

How to Run Python Scripts in Scrapy Cloud

How to Run Python Scripts in Scrapy Cloud

You can deploy, run, and maintain control over your Scrapy spiders in Scrapy Cloud, our production environment. Keeping control means you need to be able to know what’s going on with your spiders and to find out early if they are in trouble. This is one of the reasons why being able to run any Python script in Scrapy Cloud is a nice feature. You can customize to your heart’s content and automate any crawling-related tasks that you may need…

Read More Read More

Embracing the Future of Work: How To Communicate Remotely

Embracing the Future of Work: How To Communicate Remotely

What does “the Future of Work” mean to you? To us, it describes how we approach life at Scrapinghub. We don’t work in a traditional office (we’re 100% distributed) and we allow folks the freedom to make their own schedules (you know when you work best). By finding ways to break away from the traditional 9-to-5 mode, we ended up creating a framework for the Future of Work. Maybe you’ve heard of this term and want to learn more or maybe you’re…

Read More Read More

How to Deploy Custom Docker Images for Your Web Crawlers

How to Deploy Custom Docker Images for Your Web Crawlers

What if you could have complete control over your environment? Your crawling environment, that is… One of the many benefits of our upgraded production environment, Scrapy Cloud 2.0, is that you can customize your crawler runtime environment via Docker images. It’s like a superpower that allows you to use specific versions of Python, Scrapy and the rest of your stack, deciding if and when to upgrade. With this new feature, you can tailor a Docker image to include any dependency…

Read More Read More

Improved Frontera: Web Crawling at Scale with Python 3 Support

Improved Frontera: Web Crawling at Scale with Python 3 Support

Python is our go-to language of choice and Python 2 is losing traction. In order to survive, older programs need to be Python 3 compatible. And so we’re pleased to announce that Frontera will remain alive and kicking because it now supports Python 3 in full! Joining the ranks of Scrapy and Scrapy Cloud, you can officially continue to quickly create and scale fully formed crawlers without any issues in your Python 3-ready stack. As a key web crawling toolbox…

Read More Read More

How to Crawl the Web Politely with Scrapy

How to Crawl the Web Politely with Scrapy

The first rule of web crawling is you do not harm the website. The second rule of web crawling is you do NOT harm the website. We’re supporters of the democratization of web data, but not at the expense of the website’s owners. In this post we’re sharing a few tips for our platform and Scrapy users who want polite and considerate web crawlers. Whether you call them spiders, crawlers, or robots, let’s work together to create a world of Baymaxs,…

Read More Read More