SEO Data Scraping

Categories
Digital marketing Blog SEO Blog SEO Tools

Web data extraction procedure, also known as scraping, is widely used by marketers for collecting prices from competitors’ websites. But it can also be useful for other digital marketing specialists, including PPC, SEO, and content marketers. To prove this point, we’ll show you 5 ways to use scraping for comprehensive competitor analysis. We will use Netpeak Spider as web scraping tool.

1. Scraping Prices from Competitors’ Websites

Competitors’ price scraping is one of the main routine tasks digital marketers and SEO specialists face in eCommerce projects. The process consists of two parts. The first one is to find an element with the appropriate value (data). And the second is its extraction. By the way, you can select the necessary categories or pages for extraction.
In most cases, you need to perform the following actions to scrape prices from a website:

  1. Open product page
  2. Find the price and hover over it.
  3. Right-click on it and select ‘Inspect element’.
  4. Scroll to a highlighted line and right-click on it.
  5. Select ‘Copy’ → ‘Copy XPath’.

Select ‘Copy’ → ‘Copy XPath’

  1. Launch Netpeak Spider.
  2. Open ‘Settings’ → ‘Scraping’.
  3. Turn on ‘Use HTML scraping’ option.
  4. Choose the ‘Xpath’ type of search and enter the code you’ve copied in the ‘Search expressions’ box. Then choose ‘Data extraction’ mode → ‘Inner text’.

Data extraction

  1. Click ‘ОК’ button to save settings and close the window.
  2. Put website URL into the address bar and launch scanning with ‘Start’ button.
  3. After scanning is finished, go to the sidebar and open ‘Reports’ → ‘Scraping’ tab.
  4. Click on the line with a number of pages containing requested data.
  5. Click on ‘Show selected’ button.
  6. Look up the report in a new window and export scraping data with ‘Export’ button.

You also can use this method to scrape all information about product lines, special purchase conditions and any other product specifications described on product pages.
By the way, scraping can be used for competitor analysis and creating Google Adwords product feed.

2. Most Popular Competitors’ Content Analysis

You cannot ignore your competitors’ moves while creating content in a highly competitive environment. You can use scraping to find out the most viral competitors’ publications. This way, you will obtain an overall picture and find common factors of successful content.
If there are any views, shares, likes or repost counters open for public viewing, you can scrape their values the following way:

  1. Open a page with competitor’s publication.
  2. Find a counter with indicator you’re interested in.
  3. Hover over its value.
  4. Left-click on it and select ‘Inspect element’.

Left-click on it and select ‘Inspect element’

  1. Copy Xpath.
  2. Set and launch scraping procedure as shown above.
  3. Export scraping data.

3. Google SERP Scraping

You can automatically get a piece of information about the top of Google SERP using scraping. You can scrape up to 100 snippets with URLs, Titles, and Descriptions. Perform scraping in the following way:

  1. Open Google search page and type your query.
  2. Go to a search settings page.
    Go to a search settings page.
  3. Set the right amount of search results per page.
    Set the right amount of search results per page.
  4. Save settings and go back to SERP page.
  5. Copy SERP page URL.
  6. Launch Netpeak Spider.
  7. Select «List of URLs» → «Enter Manually» in the Netpeak Spider main menu.
  8. Paste copied URL in a new window.

Paste copied URL in a new window.
You can simultaneously paste as many URLs as search queries you’re interested in.

  1. Open ‘Settings’ → ‘Scraping’.
  2. Turn on ‘Use HTML scraping’ option.
  3. Set names for searches — for example, SERP Title, SERP Description and SERP URL.
  4. Choose the ‘Xpath’ search type. Then choose ‘Data extraction’ mode → ‘Inner text’.
  5. Enter in the search box following pieces of code:
    • for Title scraping—
      //*[@id="rso"]//div[1]/div/div/div/h3/a
    • for Description scraping—
      //*[@id="rso"]//div[1]/div/div/div/div/div/span
    • for URLs scraping —
      //*[@id="rso"]//div[1]/div/div/div/h3//@href
  6. Do not close current window and move to ‘User Agent’ tab. Choose Chrome as user-agent.
  7. Open ‘Advanced’ tab and turn off all parameters.
  8. Save settings.
  9. Go to the sidebar and open ‘Parameters’ tab. Turn off all parameters except ‘Scraping’.
  10. Start scanning.
  11. At the end of scanning, you will see a few columns that match searches you’ve set before. There will be numbers of found values .
    There will be numbers of found values
  12. To see the results of each search, double-click on the value you’re interested in.
  13. Look at the scraping data in an opened table.
  14. Left-click on a ‘Report’ button for quick switch between results of each search.
  15. To download results table, click ‘Export’ button and save it as a file.

4. SEO Competitor Analysis

While testing new methods for improving website optimization you must wonder what SEO strategy your competitors have. What technologies do they use? What methods helped them achieve current results?
Scraping will help you quickly get answers on the following questions:

  • Do your competitors use some specific structured data?
  • What elements of structured data are used on competitors pages with rich snippets?
  • Do your competitors use external media content (from Youtube or other audio/video platforms) on their product pages, for example?
  • What kind of metadata do your competitors use?

You can get all answers to these questions using simultaneous searches for different competitors’ websites in Netpeak Spider.

4.1. Structured Data Scraping

If you want to know if your competitor uses some specific Schema element, you can perform the following actions:

  1. Open ‘Settings’ → ‘Scraping’.
  2. Turn on ‘Use HTML scraping’ option.
  3. Choose the ‘Contains’ search type. Then choose ‘Data extraction’ mode → ‘All source code’.
  4. If you need to check if there are some specific elements in the page code, enter itemprop=”name” in the search box (you can set any other element of Schema glossary instead of name).
  5. If you need to see entire structured data of each page, choose the ‘XPath’ search type and ‘Data extraction’ mode → ‘Entire HTML element’.
  6. Enter
    //*[@itemtype]/@itemtype

    in the search box.

  7. Save settings and start scanning.

4.2. Media Content Search

If you want to find out if there are any pages with content embedded with iFrame (Youtube and Vimeo videos, audio tracks from Soundcloud, Bandcamp, etc.), follow these instructions:

  1. Open ‘Settings’ → ‘Scraping’.
  2. Turn on ‘Use HTML scraping’ option.
  3. Choose the ‘Contains’ search type. Then choose ‘Data extraction’ mode → ‘All source code’.
  4. Launch scraping.
  5. Export scraping data.

5. Scraping Customer Reviews

If you work in a niche where review platforms have a big influence, it’s important to monitor your company reviews and your competitors’. To scrape reviews automatically, you can use Netpeak Spider and perform data extraction with regular expressions. For example, to scrape reviews from G2 Crowd, you need to:

  1. Open ‘Settings’ → ‘Scraping’.
  2. Turn on ‘Use HTML scraping’ option.
  3. Create searches to extract positive and negative review parts.
  4. Choose the ‘RegExp’ search type. Then choose ‘Data extraction’ mode → ‘All source code.’
  5. Use regular expression
    (?
  6. Save settings and enter a list of G2 Crowd pages with reviews and start scraping.
  7. Export results just like we’ve explained in paragraph 3.

Summary

Practical use of scraping is not limited to price extraction. We’ve described scraping procedure with Netpeak Spider to show you that it can be used for solving a lot of different daily marketing tasks:

  • Scraping prices and product data from competitors’ websites.
  • Analyzing most popular competitors’ content.
  • Google SERP scraping.
  • SEO competitor analysis.
  • Scraping customer reviews from specialized niche websites.

By the way, all Plerdy blog readers can buy Netpeak Software products including Netpeak Spider with 10% discount. Follow this link or use discount code 26618a85 to activate it.

Andrew Chornyy - 001
Article by:
CEO Andrew Chornyy

CEO Plerdy — expert in SEO&CRO with over 14 years of experience.

Leave a reply for "SEO Data Scraping"

Your email address will not be published. Required fields are marked *