ruby - Scrapping a webpage with Mechanize and Nokogiri and storing data in XML doc -

August 15, 2011

i trying scrap website , store data in xml using mechanize , nokogiri. didn't set rails project , using ruby , irb.

i wrote method:

def mechanize_club     agent = mechanize.new     agent.get("http://www.rechercheclub.applipub-fft.fr/rechercheclub/")     form = agent.page.forms.first     form.field_with(:name => 'codeligue').options[0].select     form.submit     page2 = agent.get('http://www.rechercheclub.applipub-fft.fr/rechercheclub/club.do?codeclub=01670001&millesime=2015')     body = page2.body     html_body = nokogiri::html(body)     codeclub = html_body.search('.form').children("tr:first").children("th:first").to_i     @codeclubs << codeclub     filepath  = '/davidgeismar/documents/codeclubs.xml'     builder   = nokogiri::xml::builder.new(encoding: 'utf-8') |xml|        xml.root {           xml.codeclubs {             @codeclubss.each |c|               xml.codeclub {                 xml.code_   c.code               }             end           }         }     end     puts builder.to_xml   end

my first problem don't know how test code. call ruby webscrapper.rb in console, file treated think, doesn't create xml file in specified path. then, more quite sure code wrong didn't chance test it.

basically trying submit form several times:

 agent = mechanize.new       agent.get("http://www.rechercheclub.applipub-fft.fr/rechercheclub/")       form = agent.page.forms.first       form.field_with(:name => 'codeligue').options[0].select       form.submit

i think code ok, dont want select options[0], want select option, scrap data need, go page, select options[1]... until there no more options (an iteration guess).

the file treated think, doesnt create xml file in specified path.

there nothing in code creates file. print output, don't open or write file.

perhaps should read io , file documentation , review how using filepath variable?

the second problem don't call method anywhere. though it's defined , ruby see , parse method, has no idea want unless invoke method:

def mechanize_club   ... end  mechanize_club()

Search This Blog

Call

ruby - Scrapping a webpage with Mechanize and Nokogiri and storing data in XML doc -

Comments

Post a Comment

Popular posts from this blog

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -