SEO Data Scraping

Andrew Chornyy - 001
Andrew Chornyy

CEO Plerdy — expert in SEO&CRO with over 14 years of experience.

Digital marketing Blog SEO Blog SEO Tools

Web data extraction, also known as scraping, is widely used by marketers to collect prices from competitors’ websites. However, it can also be useful for other digital marketing specialists, including PPC, SEO, and content marketers. To prove this point, we’ll show you five ways to use scraping for comprehensive competitor analysis. We will use Netpeak Spider as a web scraping tool.

Leveraging Scraping for Market Trend Analysis

SEO Data Scraping - 0001

Understanding market trends is crucial for digital marketers who want to stay ahead of the competition. Scraping data from various sources allows marketers to gain insights into emerging trends, customer preferences, and industry developments. By collecting data from blogs, news sites, and social media platforms, marketers can identify trending topics, popular hashtags, and emerging industry buzzwords. This information can inform content creation, advertising strategies, and product development.

To perform market trend analysis, follow these steps with Netpeak Spider:

  1. Identify Sources: List relevant websites, blogs, and social media platforms where your audience frequently engages.
  2. Set Up Scraping: Configure Netpeak Spider to extract content headlines, publication dates, and engagement metrics.
  3. Data Analysis: Use the extracted data to identify patterns and trends, such as frequently mentioned topics or spikes in engagement around specific issues.
  4. Strategize: Leverage insights to refine your marketing strategies, align content with audience interests, and anticipate future trends.

By systematically analyzing market trends, businesses can make informed decisions, optimize their marketing efforts, and stay competitive. Scraping offers a data-driven approach to understanding the dynamic market landscape, ensuring that your strategies are both proactive and responsive to the ever-changing digital environment.

1. Scraping Prices from Competitors’ Websites

Competitors’ price scraping is one of the main routine tasks digital marketers and SEO specialists face in eCommerce projects. The process consists of two parts: finding an element with the appropriate value (data) and extracting it. You can also select the necessary categories or pages for extraction.
In most cases, you need to perform the following actions to scrape prices from a website:

  1. Open product page
  2. Find the price and hover over it.
  3. Right-click on it and select ‘Inspect element’.
  4. Scroll to a highlighted line and right-click on it.
  5. Select ‘Copy’ → ‘Copy XPath’.

SEO Data Scraping - 0002

  1. Launch Netpeak Spider.
  2. Open ‘Settings’ → ‘Scraping’.
  3. Turn on the ‘Use HTML scraping’ option.
  4. Choose the ‘Xpath’ type of search and enter the code you’ve copied in the ‘Search expressions’ box. Then choose ‘Data extraction’ mode → ‘Inner text’.

SEO Data Scraping - 0003

  1. Click the ‘ОК’ button to save settings and close the window.
  2. Put the website URL into the address bar and launch scanning with the ‘Start’ button.
  3. After scanning is finished, go to the sidebar and open the ‘Reports’ → ‘Scraping’ tab.
  4. Click on the line with the number of pages containing the requested data.
  5. Click on the ‘Show selected’ button.
  6. Look up the report in a new window and export scraping data with the ‘Export’ button.

You can also use this method to scrape all information about product lines, special purchase conditions, and any other product specifications described on product pages.

By the way, scraping can be used for competitor analysis and creating Google Adwords product feeds.

2. Most Popular Competitors’ Content Analysis

While creating content in a highly competitive environment, you cannot ignore your competitors’ moves. You can use scraping to find the most viral competitors’ publications. This will give you an overall picture and help you identify common factors of successful content.
If there are any views, shares, likes or repost counters open for public viewing, you can scrape their values the following way:

  1. Open a page with a competitor’s publication.
  2. Find a counter with an indicator you’re interested in.
  3. Hover over its value.
  4. Left-click on it and select ‘Inspect element’.

SEO Data Scraping - 0004

  1. Copy Xpath.
  2. Set and launch the scraping procedure as shown above.
  3. Export scraping data.

3. Google SERP Scraping

You can automatically get a piece of information about the top of Google SERP using scraping. You can scrape up to 100 snippets with URLs, Titles, and Descriptions. Perform scraping in the following way:

  1. Open the Google search page and type your query.
  2. Go to a search settings page.
    SEO Data Scraping - 0005
  3. Set the right amount of search results per page.
    SEO Data Scraping - 0006
  4. Save settings and go back to the SERP page.
  5. Copy the SERP page URL.
  6. Launch Netpeak Spider.
  7. Select «List of URLs» → «Enter Manually» in the Netpeak Spider main menu.
  8. Paste the copied URL in a new window.

SEO Data Scraping - 0007You can simultaneously paste as many URLs as search queries you’re interested in.

  1. Open ‘Settings’ → ‘Scraping’.
  2. Turn on the ‘Use HTML scraping’ option.
  3. Set names for searches, such as SERP Title, SERP Description, and SERP URL.
  4. Choose the ‘Xpath’ search type. Then choose ‘Data extraction’ mode → ‘Inner text’.
  5. Enter in the search box the following pieces of code:
    • for Title scraping—
      //*[@id="rso"]//div[1]/div/div/div/h3/a
    • for Description scraping—
      //*[@id="rso"]//div[1]/div/div/div/div/div/span
    • for URL scraping —
      //*[@id="rso"]//div[1]/div/div/div/h3//@href
  6. Do not close the current window. Move to the ‘User-Agent’ tab and choose Chrome as a user agent.
  7. Open the ‘Advanced’ tab and turn off all parameters.
  8. Save settings.
  9. Go to the sidebar and open the ‘Parameters’ tab. Turn off all parameters except ‘Scraping’.
  10. Start scanning.
  11. At the end of scanning, you will see a few columns that match searches you’ve set before. A number of values will be found.
    SEO Data Scraping - 0008
  12. To see the results of each search, double-click on the value you’re interested in.
    SEO Data Scraping - 0009
  13. Look at the scraping data in an opened table.
  14. Left-click on a ‘Report’ button for a quick switch between the results of each search.
    SEO Data Scraping - 0010
  15. To download the results table, click the ‘Export’ button and save it as a file.

4. SEO Competitor Analysis

While testing new methods for improving website optimization, you must wonder what SEO strategy your competitors have. What technologies do they use? What methods helped them achieve current results?
Scraping will help you quickly get answers to the following questions:

  • Do your competitors use some specific structured data?
  • What elements of structured data are used on competitors’ pages with rich snippets?
  • Do your competitors use external media content (from YouTube or other audio/video platforms) on their product pages, for example?
  • What kind of metadata do your competitors use?

You can get all answers to these questions using simultaneous searches for different competitors’ websites in Netpeak Spider.

4.1. Structured Data Scraping

If you want to know if your competitor uses some specific Schema element, you can perform the following actions:

  1. Open ‘Settings’ → ‘Scraping’.
  2. Turn on the ‘Use HTML scraping’ option.
  3. Choose the ‘Contains’ search type. Then choose ‘Data extraction’ mode → ‘All source code’.
  4. If you need to check if there are some specific elements in the page code, enter itemprop=”name” in the search box (you can set any other element of the Schema glossary instead of the name).
  5. If you need to see each page’s entire structured data, choose the ‘XPath’ search type and ‘Data extraction’ mode → ‘Entire HTML element’.
  6. Enter
    //*[@itemtype]/@itemtype

    in the search box.

  7. Save settings and start scanning.

4.2. Media Content Search

If you want to find out if there are any pages with content embedded with iFrame (YouTube and Vimeo videos, audio tracks from Soundcloud, Bandcamp, etc.), follow these instructions:

  1. Open ‘Settings’ → ‘Scraping’.
  2. Turn on the ‘Use HTML scraping’ option.
  3. Choose the ‘Contains’ search type. Then choose ‘Data extraction’ mode → ‘All source code’.
  4. Launch scraping.
  5. Export scraping data.

5. Scraping Customer Reviews

If you work in a niche where review platforms have a big influence, it’s important to monitor your company reviews and your competitors. To scrape reviews automatically, you can use Netpeak Spider and perform data extraction with regular expressions. For example, to scrape reviews from G2 Crowd, you need to:

  1. Open ‘Settings’ → ‘Scraping’.
  2. Turn on the ‘Use HTML scraping’ option.
  3. Create searches to extract positive and negative review parts.
  4. Choose the ‘RegExp’ search type. Then choose ‘Data extraction’ mode → ‘All source code.’
  5. Use regular expression
    (?
  6. Save settings, enter a list of G2 Crowd pages with reviews, and start scraping.
  7. Export results just like we’ve explained in paragraph 3.

Summary

The practical use of scraping is not limited to price extraction. We’ve described the scraping procedure with Netpeak Spider to show you that it can be used for solving a lot of different daily marketing tasks:

  • Scraping prices and product data from competitors’ websites.
  • Analyzing most popular competitors’ content.
  • Google SERP scraping.
  • SEO competitor analysis.
  • Scraping customer reviews from specialized niche websites.

By the way, all Plerdy blog readers can buy Netpeak Software products, including Netpeak Spider, with a 10% discount. Follow this link or use discount code 26618a85 to activate it.