Skip to content

Announcing Portia, the Open Source Visual Web Scraper!

by on April 1, 2014

We’re proud to announce the developer release of Portia, our new open source visual scraping tool based on Scrapy. Check out this video:

As you can see, Portia allows you to visually configure what’s crawled and extracted in a very natural way. It provides immediate feedback, making the process of creating web scrapers quicker and easier than ever before!

Portia is available to developers on github. We plan to offer a hosted version on Scrapinghub soon, which will be compatible with Autoscraping and fully integrated with our platform.

Please send us your feedback!

  1. Wow, this looks great! Could be very useful for doing stuff like event scraping.
    Love the heavy bass groove in the video too.

  2. This looks promising. Only a couple days back I was thinking about what needs to catch on for such a scraping service to become available and could not reason as to why this was not released earlier. Great to see that it is finally here and can’t wait to get my hands dirty.

  3. Is this an april fools joke or the real deal? I have to ask.

    • Pablo Hoffman permalink

      Real deal!. Is it that good to be true? :)

      • Yeah, it is… I’m was looking for something like it for years.

  4. Jason permalink

    Very useful!!! Looking forward to the release.. 😀

  5. Does this support website authentication as well?

  6. Just came from Reddit and a user replied, more ore less, “People have to seriously think about the date before they release the project”

    Now that is something to think about.. :)

  7. Reblogged this on Information Simplified and commented:

  8. Guyon permalink

    looking great, easy to get started… but where is my json output stored? I can’t find it, except in the log mixed with all the other log output.


  9. Congrats on the release!

  10. Albert permalink

    Reblogged this on sonofbluerobot.

  11. Rasmus Wriedt Larsen permalink

    Awesome! I have been waiting for something like this for ages!

  12. foreverscape permalink

    The Russians are gonna love this!

  13. Michael permalink

    Cool! Very similar to

  14. Yosemite Sam permalink

    Installing this is a hair ripping nightmare. Doesn’t anyone think that installation instructions might be required? Those I’ve found elsewhere leave MUCH to be desired. Bottom line – this tool is completely and utterly useless since it can’t be installed. Prove me wrong with detailed installation instructions – in my case, for ubuntu 12.04

    • There are some installation instructions in the github README. The vagrant VM might be a good option if you are having difficulty. Take a look at the script which should also be useful for your platform if you don’t want to use vagrant.

      Please keep in mind that this is an early developer release of an open source project. We wanted to share it and get feedback and contributions. Documentation is one of the many things we plan to improve.

  15. BBB permalink

    I need detailed procedure for how to install portia..

  16. Gman permalink

    When i run the portiacrawl script, it’s going to loop, and i must to stop script manually. How to make stop portiacrawl automatically when all items are scraped?

  17. suresh permalink

    can we able to install splash in portia?

  18. Thanks will definitely give this a try

  19. for a single page,it is useful. But,I want to crawl all the similar pages on one’s website. I don’t know how I can gain all the urls. I can’t find in doc

  20. Reblogged this on critical media review and commented:
    This looks like a very interesting tool:

Trackbacks & Pingbacks

  1. Portia - Un outil de web scrapping visuel « Korben
  2. Portia – Un outil de web scrapping visuel | L'actualité de la High Tech
  3. Portia – Un outil de web scrapping visuel « Mes idées HIGH TECH
  4. Open Source at Scrapinghub | Scrapinghub Blog
  5. Portia : Visual Scraping tool using scrapy | Akash Jains Blog from Dubai
  6. Portia, un web scraper visuel open source » Développeuse Informatique
  7. #5 Notable on the InterWebs | CoderZen
  8. 5 Web Scraping Tools for Extracting Data - CodeCondo
  9. Looking back at 2014 | The Scrapinghub Blog
  10. New Changes to Our Scrapy Cloud Platform | The Scrapinghub Blog
  11. Scrape Data Visually with Portia and Scrapy Cloud | The Scrapinghub Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 60 other followers

%d bloggers like this: