selenium - How to collect data of Google Search with beautiful soup using python -
i want know how can collect url's , page source using beautiful soup , can visit of them 1 one in google search results , move next google index pages.
here url https://www.google.com/search?q=site%3awww.rashmi.com&rct=j want collect , screen shot here http://www.rashmi.com/blog/wp-content/uploads/2014/11/screencapture-www-google-com-search-1433026719960.png
here code i'm trying
def getpagelinks(page): links = [] link in page.find_all('a'): url = link.get('href') if url: if 'www.rashmi.com/' in url: links.append(url) return links def links(url): purl = urlparse(url) return parse_qs(purl.query)[0] def pagesvisit(browser, printinfo): pageindex = 1 visited = [] time.sleep(5) while true: browser.get("https://www.google.com/search?q=site:www.rashmi.com&ei=50hqvdcqjozeogs7uokadg" + str(pageindex)+"&start=10&sa=n") plist = [] count = 0 pageindex += 1
try should work.
def getpagelinks(page): links = [] link in page.find_all('a'): url = link.get('href') if url: if 'www.rashmi.com/' in url: links.append(url) return links def links(url): purl = urlparse(url) return parse_qs(purl.query) def pagesvisit(browser, printinfo): start = 0 visited = [] time.sleep(5) while true: browser.get("https://www.google.com/search?q=site:www.rashmi.com&ei=v896vdilecpmusk7gdah&" + str(start) + "&sa=n") plist = [] count = 0 # random sleep make sure loads time.sleep(random.randint(1, 5)) page = beautifulsoup(browser.page_source) start +=10 if start ==500: browser.close()
Comments
Post a Comment