Transitioning to Remote Working as a Company

I’d like to echo Joel Gasgoine’s sentiments: This is not normal remote working!

Like Buffer, we’ve been a remote-first company for almost 10 years and we’re also adjusting to the new normal as a result of COVID-19.

A Practical Guide to Web Data QA Part I: Validation Techniques

When it comes to web scraping at scale, there’s a set of challenges you need to overcome to extract the data. But once you are able to get it, you still have work to do. You need to have a data QA process in place. Data quality becomes especially crucial if you’re extracting high volumes of data from the web regularly and your team’s success depends on the quality of the scraped data.

COVID-19: Handling the Situation as a Fully Remote Company

Scrapinghub is a fully distributed organization with a remote workforce spread across the globe. This structure will enable us to continue to operate at full capacity during the Coronavirus pandemic and deliver full service to our customers.

Extracting clean article HTML with News API

The Internet offers a vast amount of written content in the form of articles, news, blog posts, stories, essays, tutorials that can be leveraged by many useful applications:

Job Postings Beta API: Extract Job Postings at Scale

We’re excited to announce our newest data extraction API, Job Postings API. From now on, you can use AutoExtract to extract Job Postings data from many job boards and recruitment sites. Without writing any custom data extraction code!

Building spiders made easy: GUI For Your Scrapy Shell

As a python developer at Scrapinghub, I spend a lot of time in the Scrapy shell. This is a command-line interface that comes with Scrapy and allows you to run simple, spider compatible code. It gets the job done, sure, but there’s a point where a command-line interface can become kinda fiddly and I found I was passing that point pretty regularly. I have some background in tool design and task...