selenium - How to collect data of Google Search with beautiful soup using python -

January 15, 2014

i want know how can collect url's , page source using beautiful soup , can visit of them 1 one in google search results , move next google index pages.

here url https://www.google.com/search?q=site%3awww.rashmi.com&rct=j want collect , screen shot here http://www.rashmi.com/blog/wp-content/uploads/2014/11/screencapture-www-google-com-search-1433026719960.png

here code i'm trying

def getpagelinks(page): links = [] link in page.find_all('a'):     url = link.get('href')     if url:         if 'www.rashmi.com/' in url:             links.append(url) return links  def links(url): purl = urlparse(url) return parse_qs(purl.query)[0]  def pagesvisit(browser, printinfo): pageindex = 1 visited = [] time.sleep(5) while true:       browser.get("https://www.google.com/search?q=site:www.rashmi.com&ei=50hqvdcqjozeogs7uokadg" + str(pageindex)+"&start=10&sa=n")     plist = []     count = 0      pageindex += 1

try should work.

def getpagelinks(page): links = [] link in page.find_all('a'): url = link.get('href') if url:     if 'www.rashmi.com/' in url:         links.append(url) return links  def links(url): purl = urlparse(url) return parse_qs(purl.query)  def pagesvisit(browser, printinfo):     start = 0     visited = []     time.sleep(5)     while true:               browser.get("https://www.google.com/search?q=site:www.rashmi.com&ei=v896vdilecpmusk7gdah&" + str(start) + "&sa=n")       plist = []     count = 0     # random sleep make sure loads     time.sleep(random.randint(1, 5))     page = beautifulsoup(browser.page_source)       start +=10           if start ==500:     browser.close()

Search This Blog

Call

selenium - How to collect data of Google Search with beautiful soup using python -

Comments

Post a Comment

Popular posts from this blog

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -