Portia: The Open Source Alternative to Kimono Labs

Portia: The Open Source Alternative to Kimono Labs

Attention Kimono users: we’ve created an exporter so you can easily convert your projects from Kimono to Portia!

Imagine your business depended heavily on a third party tool and one day that company decided to shut down its service with only 2 weeks notice. That, unfortunately, is what happened to users of Kimono Labs yesterday.

And it’s one of the many reasons why we love open source so much.

Portia is an open source visual scraping tool developed by Scrapinghub to make it easier to get data from the web without needing to write a line of code.

You can do anything with Portia that you can with Kimono Labs, but without vendor lock-in. This allows you to run your spiders on our platform, but you always have the option to move to your own infrastructure.

Portia-Demo

Also, we won’t be leaving you high and dry, Kimono users. We’ve developed an exporter to easily migrate your Kimono projects to Portia. Check it out: kimono.scrapinghub.com

Portia’s Features

  • Open source (you won’t need to worry about us shutting down on you)
    • You can always export all your data and your crawler configurations
  • Support for JavaScript-based websites
    • User interactions (such as click, scroll, wait, filling forms) are simulated by recording and replaying user actions on the page
  • Browser-based, so there’s no need for extensions
  • Portia is based on Scrapy, and can be extended and customized further using code

Get started with Portia here. And take a look at our Knowledge Base in case you have any questions.

Kimono and Portia

Take a look at Kimono and Portia in action:

Kimono final
Kimono
Portia final
Portia

Portia 2.0

Our upcoming Portia 2.0 release includes:

  • Extracting multiple items from a list
  • Nested items support
  • New, revamped UI based on user experiences

And soon after, we’ll also be adding new features like:

  • A visual method of defining links to follow without the need for regular expressions.
  • A way to download Portia projects as Scrapy projects using CSS and XPath selectors

Scalable Platform

Portia is fully integrated into our platform, Scrapy Cloud, but you can also checkout the repository and run it locally or on your own server. The benefits of running Portia on Scrapy Cloud include:

  • Robust scheduling
  • On-demand scaling
  • Monitoring add-on that checks if all the expected items were extracted for each crawl
  • View and compare the items extracted through its UI
  • Built-in add-ons for Crawlera and Splash, along with third party tools

Wrap Up

You can try out Portia here, although you’ll need to register (free!) to deploy your spiders to Scrapy Cloud. Plus, since Portia is open source, we welcome any and all developers who are interested in contributing.

Selection_245

migrate-button

Be the first to know. Gain insights. Make better decisions.

Use web data to do all this and more. We’ve been crawling the web since 2010 and can provide you with web data as a service.

Tell me more

7 thoughts on “Portia: The Open Source Alternative to Kimono Labs

    1. 1. No lock-in. Portia is open source, so you don’t have to worry about Portia shutting down (like what happened with KimonoLabs). You can use our hosted version or run Portia in your own infrastructure whenever you want.
      2. Portia runs on our platform, which means that you will benefit from powerful scheduling, on-demand scaling and monitoring for QA. Everything in a robust architecture that is used to crawl more than 3 billion pages per month (1.2k pages per second).
      3. It’s browser based. You don’t need to download apps to get your work done. Import.io requires you to download their app to be able to select the data you want to extract from the page.

      There is an answer on Quora highlighting the differences: https://www.quora.com/Who-are-the-competitors-to-import-io/answer/Valdir-Stumm-Junior

      1. One thing I haven’t seen Portia able to do that Import.io can do is select portions of text and learn from that over multiple pages. For instance, a whole block of business name, address, city, state, zip, etc. I can split that up in import.io by selecting the pertinent text, whereas portia forces me to either use regex and/or xpath to do the same thing.

        Feel free to correct me if I’m wrong though.

Leave a Reply

Your email address will not be published. Required fields are marked *