Browsed by
Author: Cecilia Haynes

Looking Back at 2016

Looking Back at 2016

We started 2016 with an eye on blowing 2015 out of the water. Mission accomplished. Together with our users, we crawled more in 2016 than the rest of Scrapinghub’s history combined: a whopping 43.7 billion web pages, resulting in 70.3 billion scraped records! Great work everyone! In the what follows, we’ll give you a whirlwind tour of what we’ve been up to in 2016, along with a quick peek at what you can expect in 2017. Platform Scrapy Cloud It’s…

Read More Read More

How to Increase Sales with Online Reputation Management

How to Increase Sales with Online Reputation Management

One negative review can cost your business up to 22% of its prospects. This was one of the sobering findings in a study highlighted on Moz last year. With over half of shoppers rating reviews as important in their buying decision, no company large or small can afford to ignore stats like these – let alone the reviews themselves. In what follows I’ll let you in on how web scraping can help you stay on top. What is online reputation…

Read More Read More

How You Can Use Web Data to Accelerate Your Startup

How You Can Use Web Data to Accelerate Your Startup

In just the US alone, there were 27 million individuals running or starting a new business in 2015. With this fiercely competitive startup scene, business owners need to take advantage of every resource available, especially given a high probability of failure. Enter web data. Web data is abundant and those who harness it can do everything from keeping an eye on competitors to ensuring customer satisfaction. Web Data and Web Scraping You can get web data through a process called…

Read More Read More

Why Promoting Open Data Increases Economic Opportunities

Why Promoting Open Data Increases Economic Opportunities

During the 2016 Collision Conference held in New Orleans, our Content Strategist Cecilia Haynes interviewed conference speaker Dr. Tyrone Grandison. At the time of the interview, he was the Deputy Chief Data Officer at the U.S. Department of Commerce. Tyrone is currently the Chief Information Officer for the Institute for Health Metrics and Evaluation. Dr. Tyrone Grandison Coming fresh off his talk on “Data science, apps and civic responsibility“, Cecilia was thrilled to chat with Tyrone all about the democratization of…

Read More Read More

Interview: How Up Hail uses Scrapy to Increase Transparency

Interview: How Up Hail uses Scrapy to Increase Transparency

During the 2016 Collision Conference held in New Orleans, Scrapinghub Content Strategist Cecilia Haynes had the opportunity to interview the brains and the brawn behind Up Hail, the rideshare comparison app. Avi Wilensky is the Founder of Up Hail Avi sat down with Cecilia and shared how he and his team use Scrapy and web scraping to help users find the best rideshare and taxi deals in real time. Fun fact, Up Hail was named one of Mashable’s 11 Most…

Read More Read More

Embracing the Future of Work: How To Communicate Remotely

Embracing the Future of Work: How To Communicate Remotely

What does “the Future of Work” mean to you? To us, it describes how we approach life at Scrapinghub. We don’t work in a traditional office (we’re 100% distributed) and we allow folks the freedom to make their own schedules (you know when you work best). By finding ways to break away from the traditional 9-to-5 mode, we ended up creating a framework for the Future of Work. Maybe you’ve heard of this term and want to learn more or maybe you’re…

Read More Read More

What the Suicide Squad Tells Us About Web Data

What the Suicide Squad Tells Us About Web Data

Web data is a bit like the Matrix. It’s all around us, but not everyone knows how to use it meaningfully. So here’s a brief overview of the many ways that web data can benefit you as a researcher, marketer, entrepreneur, or even multinational business owner. Since web scraping and web data extraction are sometimes viewed a bit like antiheroes, I’m introducing each of the use cases through characters from the Suicide Squad film. I did my best to pair…

Read More Read More

Introducing the Datasets Catalog

Introducing the Datasets Catalog

Folks using Portia and Scrapy are engaged in a variety of fascinating web crawling projects, so we wanted to provide you with a way to share your data extraction prowess with the world. With this need in mind, we’re pleased to introduce the latest addition to our Scrapinghub platform: the Datasets Catalog! This new feature allows you to immediately share the results of your Scrapinghub projects as publicly searchable datasets. Not only is this a great way to collaborate with others, you…

Read More Read More

Introducing the Crawlera Dashboard

Introducing the Crawlera Dashboard

We’ve been rolling out a lot of updates, upgrades, and new features lately, and we’re continuing this trend by announcing the very first Crawlera Dashboard! Crawlera is a smart downloader that allows you to crawl and scrape websites responsibly. It rotates IP addresses and keeps track of which ones have been blocked by websites, ensuring that your crawls continue uninterrupted. Since Crawlera has always been a mainstay of Scrapinghub, we wanted to revamp its presentation to help you crawl the web…

Read More Read More

Machine Learning with Web Scraping: New MonkeyLearn Addon

Machine Learning with Web Scraping: New MonkeyLearn Addon

Say Hello to the MonkeyLearn Addon We deal in data. Vast amounts of it. But while we’ve been traditionally involved in providing you with the data that you need, we are now taking it a step further by helping you analyze it as well. To this end, we’d like to officially announce the MonkeyLearn integration for Scrapy Cloud. This feature will bring machine learning technology to the data that you extract through Scrapy Cloud. We also offer a MonkeyLearn Scrapy Middleware so…

Read More Read More