Price scraping is something that you need to do if you want to extract pricing data from websites. It might look easy and just a minor technical detail that needs to be handled but in reality, if you don’t know the best way to get those price values from the HTMLs, it can be a headache over time.
In this article, first, I will show you some examples where price scraping is essential for business success. We will then learn how to use our open-source python library, price-parser. This library was made specifically to make it easy to extract clean price information from an e-commerce site.
Why would you want to scrape prices?
If you’re at the beginning of your web scraping journey, here are some examples to give you inspiration on how price-scraping can help you.
E-commerce Competitor Monitoring
The e-commerce world has become very noisy and competitive. Companies are searching for ways to raise margins, cut expenses and ultimately display prices that increase their overall revenue the most. This is where competitor price monitoring comes in. There’s no real online retail seller that doesn’t monitor competitor prices on a daily basis in one way or another. Price scraping is a big part of this task - extracting real-time data from millions of price points on a regular basis.
Another huge use case of price scraping is brand monitoring. When your brand is visible on multiple platforms online, maintaining price compliance for your product is as important as keeping an eye on the competitor’s pricing. You would ideally want to scrape the product pages that display your products (i.e. your resellers) as well as the competitor’s product data to make sure your pricing strategy is up to date. This would help you establish a competitive price and keep the pricing policy violators in check.
You would also want to scrape prices if you do any kind of e-commerce market research. Whether it’s a one-time project or an ongoing one, if you scrape multiple web pages with different price strings it’s important to find a solution for effectively extracting pricing data.
The Quickest Way To Clean Price Strings
At Scrapinghub we’ve developed our own open-source library for price scraping. You can find it on GitHub, as price-parser. It is capable of extracting price and currency values from raw text strings.
You want to use this library for two important reasons:
- Robust price amount and currency symbol extraction (tested on 900+ real-world examples)
- No more struggle handling decimal and thousands of separators
pip install price-parser
- Select the HTML element which contains price (if you're not familiar with Scrapy and/or web scraping, check out the Scrapy documentation)
price_string = response.css(‘span.price-tag’).get() price_string
2. Use this library to clean up the string
Normally, at this point, you would need to write a custom function to get the numeric value from the string. Using regex or some python code. However, with price-parser, you just need to import the library and use the same function every time:
from price_parser import Price price = Price.fromstring(price_string)
Then we can retrieve the amount and currency values using attributes:
price.amount Decimal('22.90') # numeric price amount price.amount_text # price amount, as appears in the string '22,90' price.amount_float # price amount as float, not Decimal 22.9 Price.currency # currency symbol, as appears in the string '€'
The library has been tested with 900+ real-world price strings, see some of the supported cases here.