GoogleSearch-R post thumbnail image
However, there are vast web data scraping tools and some cloud services are available, and they are vary widely in features. Here I’ll show you one of the task from such scraping tools, that is, scraping Google Search Engine Results (Only links) using R Studio.

GoogleSearch-R using R Studio

Here, I’ll show you how to scrape the URLs from First few pages of Google Search Engine Results for whatever search query you enter, and store the listing in CSV file for further use.

Why to scrape Google Search Engine Results?
The most common reason to scrape GSERs is for keyword planning and deeper keyword analysis. The another common reason is to monitor the organic search ranking of your website in Google for specific keywords.

The Code
#-load packages

#-function to trim whitespace from string ‘x’
trim <- function( x ) {  gsub(“([[:space:]])”, “”, x) }

#-function to scrape list of URLs
googleURLs <- function(u){
    ##- parse HTML
    doc <- htmlParse(getURL(u))
    ##-find matching node with H3 Tag, Anchor Tag and HREF attribute
    attrs <- xpathApply(doc, “//h3//a[@href]”, xmlAttrs)

    ##- grab nodes matching with ‘http’
    links <- sapply(attrs, function(x) x[[1]])
    links <- grep(“http”, links, fixed = TRUE, value=TRUE)

    ##- this is necessary to remove unwanted part of links
    ##- split results with ‘&’ char

    links <- strsplit(links,’&’)
    links <- sapply(links, function(x) x[[1]][1])

    ##- split results with ‘=’ char
    links <- strsplit(links,’=’)
    links <- sapply(links, function(x) x[[2]][1])

    ##- write list of URLs to googleURLs.csv file
    write.table(plinks, file=”googlURLs.csv”, append=TRUE, sep=”,”, row.names=FALSE, col.names = FALSE)

#- Using For loop grab links from first 5 search result pages
for (i in seq(0,40,10)){
    u <- trim(paste(“”, i,””))

How result will look like?
Answer: see below..(googleURLs.csv)
… cont.

This GoogleSearch-R code is small efforts to scrape list of URLs from Google Search. By doing some tweaks you can also grab URLs from other reputed search engines like Yahoo, Bing etc.

Mayur Dighe

Leave a Reply

Your email address will not be published. Required fields are marked *