How to Scrape Prices from Websites?

How to Scrape Prices from a Website

Website Price Scraping is generally involved in gathering price listings of your competitor’s product website easily. A price scraper helps to scrape a large number of price lists from any e-Commerce website for the benefit of your company. Such data can be utilized for efficient price monitoring purposes. There are many steps involved in the process of price scraping solutions, such as mentioned below:

Steps of Price Scraping Solutions

Create your Price Monitoring Tool to Scrape Prices

There are many web scraping tutorials online, however, writing and creating a new scraper for every kind of e-Commerce website is a tedious job as well as expensive on its end. Here are a few basic web-based scrapers that can easily enable you to scrape prices from any e-Commerce page.

Web Scraping Tools for Scraping Prices

Web scraping tools specially designed for price scraping purposes are ScrapeHero Cloud. This software can help you scrape prices without much effort for coding, downloading, and learning the whole process to employ the tool. The tool already has built-in pre crawlers, which efficiently scrapes popular websites such as Amazon, Walmart, and Target easily. It also supports Scraping APIs to help keep secure scraped prices in real-time, which can let you know about the price trends within minutes.

How to Scrap Prices

In this tutorial, we are going to specifically focus on how to build a basic web scraper for scraping prices from any e-Commerce website you want!

Let us begin by looking into few product prices displayed on Amazon.com.

Observations and Patterns

Some repetitive patterns by having a bird’s eye view of the website are as following:

  1. Price is written in the form of Figures and never written as Alphabetical Letters.
  2. The price is the current figure is always written with the largest font size, to make it look apart and visible for the customers.
  3. Price should be written in such a way that it comes under the first (600 pixels) height.
  4. Generally, the price is written above the currency figures to assist customers while shopping for products online.

Of course, there exist certain exceptions with the different company websites. Let us combine these observations to create an effective price scraping crawler for scraping trending price lists from different websites.

Implementation of a generic eCommerce scraper to scrape prices

Step 1: Installation

We are using Google Chrome as a web browser. However, if you do not have the application installed, you can easily download it by following these general installation instructions. Most of the advanced developers use the programmed version of Google Chrome, which is called Puppeteer. The benefit of using Puppeteer is that you can even run the scraper without employing a GUI application to run it.

Step 2: Chrome Developer Tools

The code, which is presented in this blog, is kept simpler to make you understand every step n grave detail. However, the codes for fetching price lists from different websites vary. For this blog, our main website for price scraping is the Sephora product page on the Amazon platform in Google Chrome. All you need to do is:

  1. Pay a deep visit to the online Sephora Product Page in Google Chrome.
  2. Click anywhere on the official website page and select the option “Inspect Element” to open up Chrome DevTools.
  3. Now, carry on your search by clicking on the Console tab of DevTools.
  4. Inside the tab, you can enter any JavaScript Code.
  5. Google Chrome will execute the code that you have just opted for in the context of the Sephora Product Web Page.

Step 3: Run the JavaScript snippet

The third step involves copying and adding the following JavaScript snippet into the Console Tab to enable the price scraping.

let elements = [

 …document.querySelectorAll(‘ body *’)

]

function createRecordFromElement(element) {

 const text = element.textContent.trim()

 var record = {}

 const bBox = element.getBoundingClientRect()

if(text.length <= 30 && !(bBox.x == 0 && bBox.y == 0)) {

 record[‘fontSize’] = parseInt(getComputedStyle(element)[‘fontSize’]) }

 record[‘y’] = bBox.y

 record[‘x’] = bBox.x

 record[‘text’] = text

 return record

}

let records = elements.map(createRecordFromElement)

function canBePrice(record) {

 if( record[‘y’] > 600 ||

  record[‘fontSize’] == undefined ||

  !record[‘text’].match(/(^(US ){0,1}(rs\.|Rs\.|RS\.|\$|₹|INR|USD|CAD|C\$){0,1}(\s){0,1}[\d,]+(\.\d+){0,1}(\s){0,1}(AED){0,1}$)/)

)

 return false

 else return true

}

let possiblePriceRecords = records.filter(canBePrice)

let priceRecordsSortedByFontSize = possiblePriceRecords.sort(function(a, b) {

if (a[‘fontSize’] == b[‘fontSize’]) return a[‘y’] > b[‘y’]

return a[‘fontSize’] < b[‘fontSize’]

})

console.log(priceRecordsSortedByFontSize[0][‘text’]);

Press Right Click “To Enter”. After the action, there must be a full list of the price of the product displayed on the Console Tab. However, if you don’t see the price list in front of you then the observations of the product page you have visited are exceptional. This is normal and you can then try one of the sample pages provided in Step 2.

How It Works:

At First, we need to acquire all the HTML DOM elements which are present in the product page.

let elements = [     

 …document.querySelectorAll(‘ body *’)

]

After which we need to convert all the fetched elements into JavaScript objects. The objects will store all the necessary information (price data) in the form of XY (Positions, text, content, descriptions, font size and value) in the form of {‘text’:’Tennis Ball’, ‘fontSize’:’14px’, ‘x’:100,’y’:200}.

After which we upload the following function:

function createRecordFromElement(element) {

 const text = element.textContent.trim() // Fetches text content of the element

 var record = {} // Initiates a simple JavaScript object

 const bBox = element.getBoundingClientRect()

 // getBoundingClientRect is a function provided by Google Chrome, it returns

 // an object which contains x,y values, height and width

 if(text.length <= 30 && !(bBox.x == 0 && bBox.y == 0)) {

  record[‘fontSize’] = parseInt(getComputedStyle(element)[‘fontSize’])

 }

 // getComputedStyle is a function provided by Google Chrome, it returns an

 // object with all its style information. Since this function is relatively

 // time-consuming, we are only collecting the font size of elements whose

 // text content length is atmost 30 and whose x and y coordinates are not 0

 record[‘y’] = bBox.y

 record[‘x’] = bBox.x

 record[‘text’] = text

 return record

}

You can convert all elements incorporated into JavaScript objects by applying this function on each element using the JavaScript map.

let records = elements.map(createRecordFromElement)

function canBePrice(record) {

if(

record[‘y’] > 600 ||

record[‘fontSize’] == undefined ||

!record[‘text’].match(/(^(US ){0,1}(rs\.|Rs\.|RS\.|\$|₹|INR|USD|CAD|C\$){0,1}(\s){0,1}[\d,]+(\.\d+){0,1}(\s){0,1}(AED){0,1}$)/)

)

return false

else return true

}

let possiblePriceRecords = records.filter(canBePrice)

Finally, as it is indicated that price comes in the form of currency figures with the relevant highest font size as predicted at the start of the blog. It should be noted that if, there exist multiple currencies with the highest font size, then, in this case, the one with the highest figure must be taken into account for the given product.

We are going to sort records with the help of the JavaScript sort<em><strong> function.

let priceRecordsSortedByFontSize = possiblePriceRecords.sort(function(a, b) {

if (a[‘fontSize’] == b[‘fontSize’]) return a[‘y’] > b[‘y’]

return a[‘fontSize’] < b[‘fontSize’]

})

Now, we are only left with the need to present the extracted price lists on the console tab.

console.log(priceRecordsSortedByFontSize[0][‘text’])

Suggestions:

There are a few suggestions, which must be kept in mind to enhance price scraping results from any e-Commerce Product Website effectively.

  1. It is important to figure out many features and aspects of the extracted product price lists, such as font-weight, font color, etc.
  2. Class Names or IDs of the elements which include product prices would probably have the written word “Price”. This can be easily located by paying attention!
  3. Currency figures with “Strike-Through”are termed as regular elements and can be ignored.

How ITS Can Help You With Web Scraping Service?

Information Transformation Service (ITS) includes a variety of Professional Web Scraping Services catered by experienced crew members and technical software. ITS, is an ISO-Certified company that addresses all of your big and reliable data concerns. For the record, ITS served millions of established and struggling businesses making them achieve their mark at the most affordable price tag. For acquiring our Professional Web Scraping Services ask for a free quote!

No Comments

Post a Comment

Comment
Name
Email
Website