This is not a tutorial on how to scrape the web. This is something new I’m trying out. I’ve decided to document the whole process of creating my new “business”. And as you would think from the post’s title it is based on web scraped data. That’s why I’m gonna document it on this blog because I think some of you web scrapers are also interested in building a business around web scraping.
To be honest, I’m just a dude and I know nothing about building a business. I just know how to scrape the web and wanna build something interesting with it. I have no idea what I’m doing but I try to figure it out. This blogpost series isn’t gonna be about what I think I would do or what I suppose would work. It’s gonna be about what I’m doing right now. I hope some people will find it interesting and get some value out of it.
First of all, what kind of business am I talking about. I’m building a pricing intelligence software. It helps ecommerce companies optimize their prices through competitor monitoring and analysis. If you have no clue what I’m talking about that’s fine. Some months ago I knew nothing about pricing intelligence softwares too but I’ve done a research on the topic and read a bunch of ebooks and PDFs, watched videos about it. So now I understand how this kind of software works and why it is beneficial for ecommerce companies.
The heart of a pricing intelligence platform is web data. If you can’t gather web data at scale you can’t create a platform like this. Fortunately, I know how to fetch data from websites so I’ve been figuring out all the other stuff I need to know. So web scraping is the first step. It is the opener to create a pricing intelligence platform. What do I scrape? Primarily, I gather information about products’ prices. At the moment, I only fetch data from marketplace/price comparison websites. But later, I’m gonna gather pricing data directly from the websites.
The software has 3 main modules. The first one is the crawling engine. It gathers information from marketplace/price comparison websites then cleans and standardizes the data. After all scraped data is sent to a structured database. The second module is the product matching engine. Its job is to match the web shop’s products with the competitor’s products so it’s possible to create price analysis. This module is not functioning yet because I gather information from marketplace websites and it already matches the products for me. The third one is the analytics engine. This produces the visualizations in the app. Ultimately, this module is seen and used by the end-user.
That’s it for now. It was just a brief intro what I’ve been working on lately. In the next post I will talk about how I “validated the idea” .