A Sneak Peek Inside What Hedge Funds Think of Alternative Financial Data

A Sneak Peek Inside What Hedge Funds Think of Alternative Financial Data

Unbenounced to many, there is a data revolution happening in finance. In their never ending search for alpha hedge funds and investment banks are increasingly turning to new alternative sources of data to give them an informational edge over the market. On the 31st May, Scrapinghub got the chance to see this revolution first hand. Mills Horton and Thad Chappell of Scrapinghub were invited to Eagle Alpha’s Alternative Data Showcase in New York City, and had some of the leading…

Read More Read More

Want to Predict Fitbit’s Quarterly Revenue? Eagle Alpha Did It Using Web Scraped Product Data

Want to Predict Fitbit’s Quarterly Revenue? Eagle Alpha Did It Using Web Scraped Product Data

Throughout the history of the financial markets information has been power. The trader with access to the most accurate information can quickly gain an edge over the market. Two hundred years ago, in the age before telegrams and news services, knowing the results of battles, elections and campaigns before anybody else was a huge advantage. Fifty years ago, before Reuters began digitizing company statements, access to company financials gave fundamentals-based investors like Benjamin Graham and Warren Buffett an edge over…

Read More Read More

How Data Compliance Companies Are Turning To Web Crawlers To Take Advantage of the GDPR Business Opportunity

How Data Compliance Companies Are Turning To Web Crawlers To Take Advantage of the GDPR Business Opportunity

Over the last couple weeks, GDPR has brought data protection center stage. What was once a fringe concern for most businesses overnight became a burning problem that needed to be solved immediately. With the sweeping changes that GDPR has introduced, it has proven itself to be a huge headache for companies big and small. However, GDPR has been a goldmine for some savvy companies who positioned themselves to take full advantage of the surge in demand for data compliance solutions….

Read More Read More

Looking Back at 2017

Looking Back at 2017

It’s been another standout year for Scrapinghub and the scraping community at large. Together we crawled 79.1 billion pages (nearly double 2016), with over 103 billion scraped records; what a year! We’ll do our best here to give you the highlights of 2017 and whet your appetite for what you can expect in 2018 – let’s get into it: What’s New Let’s start with some of what was new in 2017! In July we launched a new offering specifically for…

Read More Read More

A Faster, Updated Scrapinghub

A Faster, Updated Scrapinghub

We’re very excited to announce a new look for Scrapinghub! We’ve been improving your experience by streamlining common workflows, and integrating with common tools (like our integration with Github). Today’s release is another step in that direction. Here is what’s new:   New styles We hope you like the new look! Most things are in the same place as before, so we hope it’s a seamless transition.   Onboarding For those new to the platform there’s an improved onboarding experience…

Read More Read More

Scraping the Steam Game Store with Scrapy

Scraping the Steam Game Store with Scrapy

This is a guest post from the folks over at Intoli, one of the awesome companies providing Scrapy commercial support and longtime Scrapy fans. Introduction The Steam game store is home to more than ten thousand games and just shy of four million user-submitted reviews. While all kinds of Steam data are available either through official APIs or other bulk-downloadable data dumps, I could not find a way to download the full review dataset. 1 If you want to perform your own…

Read More Read More

Do Androids Dream of Electric Sheep?

Do Androids Dream of Electric Sheep?

It got very easy to do Machine Learning: you install a ML library like scikit-learn or xgboost, choose an estimator, feed it some training data, and get a model which can be used for predictions. Ok, but what’s next? How would you know if it works well? Cross-validation! Good! How would you know that you haven’t messed up the cross validation? Are there data leaks? If the quality is not good enough, how to improve it? Are there data preprocessing…

Read More Read More

Deploy your Scrapy Spiders from GitHub

Deploy your Scrapy Spiders from GitHub

Up until now, your deployment process using Scrapy Cloud has probably been something like this: code and test your spiders locally, commit and push your changes to a GitHub repository, and finally deploy them to Scrapy Cloud using shub deploy. However, having the development and the deployment processes in isolated steps might bring you some issues, such as unversioned and outdated code running in production. The good news is that, from now on, you can have your code automatically deployed…

Read More Read More

Looking Back at 2016

Looking Back at 2016

We started 2016 with an eye on blowing 2015 out of the water. Mission accomplished. Together with our users, we crawled more in 2016 than the rest of Scrapinghub’s history combined: a whopping 43.7 billion web pages, resulting in 70.3 billion scraped records! Great work everyone! In the what follows, we’ll give you a whirlwind tour of what we’ve been up to in 2016, along with a quick peek at what you can expect in 2017. Platform Scrapy Cloud It’s…

Read More Read More

How to Increase Sales with Online Reputation Management

How to Increase Sales with Online Reputation Management

One negative review can cost your business up to 22% of its prospects. This was one of the sobering findings in a study highlighted on Moz last year. With over half of shoppers rating reviews as important in their buying decision, no company large or small can afford to ignore stats like these – let alone the reviews themselves. In what follows I’ll let you in on how web scraping can help you stay on top. What is online reputation…

Read More Read More