Interview: How Up Hail uses Scrapy to Increase Transparency
During the 2016 Collision Conference held in New Orleans, Scrapinghub Content Strategist Cecilia Haynes had the opportunity to interview the brains and the brawn behind Up Hail, the rideshare comparison app.
Avi Wilensky is the Founder of Up Hail
Avi sat down with Cecilia and shared how he and his team use Scrapy and web scraping to help users find the best rideshare and taxi deals in real time.
Meet Team Up Hail
CH: Thanks for meeting with me! Can you share a bit about your background, what your company is, and what you do?
AW: We are team Up Hail and we are a search engine for ground transportation like taxis and ride-hailing services. We are now starting to add public transportation like trains and buses, as well as bike shares. We crawl the web using Scrapy and other tools to gather data about who is giving the best rates for certain destinations.
Scrapy for the win
There’s a lot of data out there, especially public transportation data on different government or public websites. This data is unstructured and a mess and without APIs. Scrapy’s been very useful in gathering it.
CH: How has your rate of growth been so far?
AW: Approximately 100,000 new users a month search our site and app, which is nice and hopefully we will continue to grow. There’s a lot more competition now than when we started, and we’re working really hard to be the leader in this space.
Users come to our site to compare rates and to find the best deals on taxis and ground transportation. They are also interested in finding out if the different service providers are available in their cities. There are many places in the United States and across the world that don’t have these services, so we attract those who want find out more information.
We also crawl and gather a lot of different product attributes such as economy vs. luxury, shared vs. private, how many people each of these options fit, whether they accept cash, and whether you can book in advance.
Giving users transparency on different car services and transportation options is our mission.
CH: By the way, where are you based?
AW: We’re based in midtown Manhattan in a place called A Space Apart. This is run by a very notable web designer and author named Jeffrey Zeldman who has been gracious enough to host us. He also runs A Book Apart, An Event Apart, and A List Apart, which are some of the most popular communities for web developers and designers.
Why the Team Members at Up Hail are Scrapy Fans
CH: You have really found some creative applications for Scrapy. I have to ask, why Scrapy? What do you appreciate about it?
AW: A lot of the sites that we're crawling are a mess. Especially the government transit ones and local taxi companies. As a framework, Scrapy has a lot of features built in right out the box that are useful for us.
CH: Is there anything in particular that you're like, “I'm obsessed with this aspect of Scrapy?”
AW: We're a Python shop and Scrapy is the Python library for building web crawlers. That's primarily why we use it. Of course, Scrapy has such a vibrant ecosystem of developers and it's just easy to use. The documentation is great and it was super simple to get up and started. It just does the job.
We're grateful that you make such a wonderful tool [Note: We are the original authors and lead maintainers of Scrapy] that is free and open source to startups like us. There's a lot of companies in your space that are charging a lot of money and making it cost prohibitive to use.
CH: That's really great to hear! We're all about open source, so keeping Scrapy forever free is a really important aspect of this approach.
On Being a Python Shop
CH: So tell me a bit more about why you’re a Python shop?
AW: Our application runs on the Python Flask framework and we're using Python libraries to do a lot of the back-end work.
CH: Dare I ask why you’re using Python?
AW: One of the early developers on the project is a Xoogler, and Python is one of Google's primary languages. He really inspired us to use Python and we just love the language because it's the philosophy of readability, brevity, and making it simple and powerful enough to get the job done.
I think developer time is scarce and Python makes it faster to deploy, especially for a startup that needs to ship fast.
Introducing Scrapy Cloud and the Scrapinghub Platform
CH: May I ask you've used our Scrapy Cloud Platform to deploy Scrapy crawlers?
AW: We haven't tried it out yet. We just found out about Scrapy Cloud, actually.
CH: Really? Where did you hear about us?
AW: I listen to a Python podcast [Talk Python To Me] which was with Pablo, one of your co-founders. I didn't know about how Scrapy originated from your co-founders. When I saw your name in the Collision Conference app, I was like, "Oh, I know these guys from the podcast! They're maintainers of Scrapy." Now that we know about Scrapy Cloud, we'll give it a try.
We usually run Scrapy locally or we'll deploy Scrapy on an EC2 instance on Amazon Web Services.
CH: Yeah, Scrapy Cloud is our forever free production environment that lets you build, deploy, and scale your Scrapy spiders. We’ve actually just included support for Docker. Definitely let me know what you think of Scrapy Cloud when you use it.
AW: Definitely, I'll have to check it out.
Plans for Up Hail’s Expansion
CH: Where are you hoping to grow within the next five years?
AW: That's a very good question. We're hoping to, of course, expand to more regions. Right now, we're in the United States, Canada, and Europe. There's a lot of other countries that have a tremendous population that we're not covering. We'd like to add a lot more transportation options into the mix. There's all these new things like on-demand helicopters and we want to just show users going from point A to point B all their available options. We're kind of like the Expedia of ground transportation.
Also, we're adding a lot of interesting new things like a scoring system. We're scoring how rideshare-friendly a city is. New York and San Francisco, of course, get 10s, but maybe over in New Jersey, where there are less options, some cities will get 6 or 7. It depends on how many options are available. Buffalo, New York, for example, doesn't have Uber or Lyft and they would probably get like a 1 because they only have yellow taxis. This may be useful for users that are thinking of moving to a city and want to know how accessible taxi and rideshares are. We want to give users even more information about taxis and transportation options.
Increasing Transparency through Web Scraping
CH: It seems that increasing transparency is a large part of where you want to continue to grow.
AW: The transportation industry is not as transparent as it should be. We've heard stories at the Expo Hall [at Collision Conference] of taxi drivers ripping off tourists because they don't know in advance what it's going to cost. By us scraping these sites, taking the rate tables, and computing the estimates, they can just fire up our app and have a good idea of what it's going to cost.
CH: Is your business model based on something like Expedia’s approach?
AW: Similar. We get a few dollars when we sign up new users to the various providers. We're signing up a few thousand users a month. While it’s been really good so far, we need to grow it tremendously and we're looking for other business models. Also, advertising on the site has been good for us as well, but, of course, it's limited. Don't want to be too intrusive to our users by being overly aggressive with ads, so we're trying to keep it clean there.
Opening Up Up Hail’s API
AW: Within the next few months we hope to launch a public API to outside developers and other sites. We've talked with a lot of other vendors here at the expo like travel concierge apps and the like that want to bring in our data.
CH: Oh, that's great! Seems to be the makings for a lot of cross-platform collaboration.
AW: We've gathered a lot of great data, thanks to Scrapy and other crawling tools, and we hope to make it available for others to use.
In fact, I specifically reached out to you to tell you how awesome Scrapy was.
CH: Well I’m thrilled you did! And I’m so glad we also got to talk about Python and how you use it in your stack.
AW: Definitely. We are heavily using Python to get the job done. We think it's the right tool for the job and for what we're doing.
Team Up Hail at Collision Conference 2016
Interest piqued? Learn more about what web scraping and web data can do for you.