Skip to content

Announcing Portia, the Open Source Visual Web Scraper!

by on April 1, 2014

We’re proud to announce the developer release of Portia, our new open source visual scraping tool based on Scrapy. Check out this video:

As you can see, Portia allows you to visually configure what’s crawled and extracted in a very natural way. It provides immediate feedback, making the process of creating web scrapers quicker and easier than ever before!

Portia is available to developers on github. We plan to offer a hosted version on Scrapinghub soon, which will be compatible with Autoscraping and fully integrated with our platform.

Please send us your feedback!

  1. Wow, this looks great! Could be very useful for doing stuff like event scraping.
    Love the heavy bass groove in the video too.

  2. This looks promising. Only a couple days back I was thinking about what needs to catch on for such a scraping service to become available and could not reason as to why this was not released earlier. Great to see that it is finally here and can’t wait to get my hands dirty.

  3. Is this an april fools joke or the real deal? I have to ask.

    • Pablo Hoffman permalink

      Real deal!. Is it that good to be true?:)

      • Yeah, it is… I’m was looking for something like it for years.

  4. Jason permalink

    Very useful!!! Looking forward to the release.. 😀

  5. Does this support website authentication as well?

  6. Just came from Reddit and a user replied, more ore less, “People have to seriously think about the date before they release the project”

    Now that is something to think about..:)

  7. Reblogged this on Information Simplified and commented:

  8. Guyon permalink

    looking great, easy to get started… but where is my json output stored? I can’t find it, except in the log mixed with all the other log output.


  9. Congrats on the release!

  10. Albert permalink

    Reblogged this on sonofbluerobot.

  11. Rasmus Wriedt Larsen permalink

    Awesome! I have been waiting for something like this for ages!

  12. foreverscape permalink

    The Russians are gonna love this!

  13. Michael permalink

    Cool! Very similar to

  14. Yosemite Sam permalink

    Installing this is a hair ripping nightmare. Doesn’t anyone think that installation instructions might be required? Those I’ve found elsewhere leave MUCH to be desired. Bottom line – this tool is completely and utterly useless since it can’t be installed. Prove me wrong with detailed installation instructions – in my case, for ubuntu 12.04

    • There are some installation instructions in the github README. The vagrant VM might be a good option if you are having difficulty. Take a look at the script which should also be useful for your platform if you don’t want to use vagrant.

      Please keep in mind that this is an early developer release of an open source project. We wanted to share it and get feedback and contributions. Documentation is one of the many things we plan to improve.

  15. BBB permalink

    I need detailed procedure for how to install portia..

  16. Gman permalink

    When i run the portiacrawl script, it’s going to loop, and i must to stop script manually. How to make stop portiacrawl automatically when all items are scraped?

  17. suresh permalink

    can we able to install splash in portia?

  18. Thanks will definitely give this a try

  19. for a single page,it is useful. But,I want to crawl all the similar pages on one’s website. I don’t know how I can gain all the urls. I can’t find in doc

  20. Reblogged this on critical media review and commented:
    This looks like a very interesting tool:

  21. Zinc permalink

    Not a very good demo. What about multiple pages that are lists of similar elements, and you’d like each element separated in a *.csv file, and then applied to a series of pages? There’s nothing here you can’t just as easily do with any of the other scraper plugins for browsers. It seems to eliminate the need to “inspect element” which is a good start, but if this thing is capable of scraping thousands of data records off dozens of similar pages, you can’t tell from this video.

    • Ruairi Fahy permalink

      Hi Zinc,
      This demo demonstrates how to create a sample that can be applied across many pages and how to follow links.

      Once you have created your spider, which consists of rules for following links, start urls and samples to be used for extracting data, you can run it. When you run the spider it will follow links and extract data until it runs out of links or you tell it to stop. All of the data extracted while the spider is running can be output to a file in CSV, JSON or XML format.

      Portia is a tool for scraping anything from a single page up to a whole site of thousands or millions of items. If you would like to give it a try you can sign up for an account at and use it for free.

      I hope this clears up any questions you have.

  22. Highly descriptive article, I lliked that a lot.
    Will there be a part 2?

  23. Anand permalink

    Is the hosted version live already? I have been digging around my scrapinghub dashboard for an hour now and have still not able to figure out how to get here without downloading from Github,

    • Hey, Anand!

      You have to create a Portia project in your dashboard: click “Create project” and choose Portia. Then, you just have to click in the project to get into Portia.

      If you are a new user in our platform, you first have to create an organization to be able to create a project.

      By the way, we just released a beta version of Portia 2.0 with a lot of interesting features, so give it a try!:)

      • Anand permalink

        Thanks Vladimir- I am absolutely unsure how I missed this obvious link to Portia

Trackbacks & Pingbacks

  1. Portia - Un outil de web scrapping visuel « Korben
  2. Portia – Un outil de web scrapping visuel | L'actualité de la High Tech
  3. Portia – Un outil de web scrapping visuel « Mes idées HIGH TECH
  4. Open Source at Scrapinghub | Scrapinghub Blog
  5. Portia : Visual Scraping tool using scrapy | Akash Jains Blog from Dubai
  6. Portia, un web scraper visuel open source » Développeuse Informatique
  7. #5 Notable on the InterWebs | CoderZen
  8. 5 Web Scraping Tools for Extracting Data - CodeCondo
  9. Looking back at 2014 | The Scrapinghub Blog
  10. New Changes to Our Scrapy Cloud Platform | The Scrapinghub Blog
  11. Scrape Data Visually with Portia and Scrapy Cloud | The Scrapinghub Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 118 other followers

%d bloggers like this: