This year I have reached a major milestone in my life, which is getting my bachelor's degree in mathematics. When I made the decision to go back to college, it was solely because my experience working at Scrapinghub, I figured out that having a math background would be a great foundation for getting into ML-related stuff.
At Scrapinghub we're always building and running large crawls–last year we had 11 billion requests made on Scrapy Cloud alone. Crawling millions of pages from the internet requires more sophistication than getting a few contacts of a list, as we need to make sure that we get reliable data, up to date lists of item pages and are able to optimise our crawl as much as possible.
Note: Portia is no longer available for new users. It has been disabled for all the new organisations from August 20, 2018 onward.
It’s been several months since we first integrated Portia into our Scrapy Cloud platform, and last week we officially began to phase out Autoscraping in favor of Portia.