Browsed by
Author: Shane Evans

Scrapinghub Crawls the Deep Web

"The easiest way to think about Memex is: How can I make the unseen seen?"

Announcing Portia, the Open Source Visual Web Scraper!

We’re proud to announce the developer release of Portia, our new open source visual scraping tool based on Scrapy. Check out this video:

Looking Back at 2013

This time last year Pablo and I were chatting about the previous year and what to expect in 2013. I noticed that our team had almost doubled in size in the previous year and we wondered could that possibly continue in 2013?

Introducing Dash

We're excited to introduce Dash, a major update to our scraping platform.

Why MongoDB Is a Bad Choice for Storing Our Scraped Data

MongoDB was used early on at Scrapinghub to store scraped data because it's convenient. Scraped data is represented as (possibly nested) records which can be serialized to JSON. The schema is not known ahead of time and may change from one job to the next. We need to support browsing, querying and downloading the stored data. This was very easy to implement using MongoDB (easier than the...

Finding Similar Items

This post describes an approach to the problem of finding near duplicates among crawled items and how this was implemented at Scrapinghub.

Autoscraping casts a wider net

We have recently started letting more users into the private beta for our Autoscraping service. We're receiving a lot of applications following the shutdown of Needlebase and we're increasing our capacity to accommodate these users.

Hello, world

It's finally time to start a Scrapinghub blog! In the upcoming months we expect to open our private beta to new customers, launch new services, add many new features and continue to contribute to open source projects. It's about time we had a way to to tell everyone about all the great things that are happening!

Be the first to know. Gain insights. Make better decisions.

Use web data to do all this and more. We’ve been crawling the web since 2010 and can provide you with web data as a service.

Tell me more

Welcome

Here we blog about all things related to web scraping and web data.

If you want to learn more about how you can use web data in your company, check out our Data as a Services page for inspiration.

Learn More

Recent Posts