Browsed by
Author: Richard Dowinton

Frontera: The Brain Behind the Crawls

At Scrapinghub we're always building and running large crawls–last year we had 11 billion requests made on Scrapy Cloud alone. Crawling millions of pages from the internet requires more sophistication than getting a few contacts of a list, as we need to make sure that we get reliable data, up to date lists of item pages and are able to optimise our crawl as much as possible.

Scrape Data Visually with Portia and Scrapy Cloud

Note: Portia is no longer available for new users. It has been disabled for all the new organisations from August 20, 2018 onward.

It’s been several months since we first integrated Portia into our Scrapy Cloud platform, and last week we officially began to phase out Autoscraping in favor of Portia.

Handling JavaScript in Scrapy with Splash

A common roadblock when developing spiders is dealing with sites that use a heavy amount of JavaScript. Many modern websites run entirely on JavaScript, and require scripts to be run in order for the page to render properly. In many cases, pages also present modals and other dialogues that need to be interacted with to show the full page. In this post we’re going to show you how you can use...

New Changes to Our Scrapy Cloud Platform

We are proud to announce some exciting changes we've introduced this week. These changes bring a much more pleasant user experience, and several new features including the addition of Portia to our platform!

Introducing ScrapyRT: An API for Scrapy spiders

We’re proud to announce our new open source project, ScrapyRT! ScrapyRT, short for Scrapy Real Time, allows you to extract data from a single web page via an API using your existing Scrapy spiders.

Sign up now

Be the first to know. Gain insights. Make better decisions.

Use web data to do all this and more. We’ve been crawling the web since 2010 and can provide you with web data as a service.

Tell me more

Welcome

Here we blog about all things related to web scraping and web data.

If you want to learn more about how you can use web data in your company, check out our Data as a Services page for inspiration.

Follow Us

Learn More

Recent Posts