android - How to get element by class name with Jsoup? -
i'm trying extract data yahoo finance pages in android.
both summary page , historical prices page of specific stock contain many td class="yfnc_tabledata1", need extract numbers. extract data inside these td in historical prices page, here:
document document = jsoup.connect("http://finance.yahoo.com/q/hp?s=lux.mi").get(); elements html = document.getelementsbyclass("yfnc_tabledata1");
but seems same snippet not working summary page, here:
document document = jsoup.connect("http://finance.yahoo.com/q?s=lux.mi").get(); elements html = document.getelementsbyclass("yfnc_tabledata1");
after reading other questions tried following 3 approaches, without success:
elements html = document.select(".yfnc_tabledata1"); html.size() = 0
elements html = document.getelementsbyattributevaluecontaining("class", "yfnc_tabledata1"); html.size() = 0
element el = document.getelementbyid("table#table1"); elements html = el.getallelements(); html.size() = error due el being null
any idea i'm doing wrong?
here snippet of summary page can't extract data:
<div class="yui-u first yfi-start-content"> <div class="yfi_quote_summary"> <div id="yfi_quote_summary_data" class="rtq_table"> <table id="table1"> <tr> <th scope="row" width="48%">prev close:</th> <td class="yfnc_tabledata1">61.15</td> </tr> <tr> <th scope="row" width="48%">open:</th> <td class="yfnc_tabledata1">61.45</td> </tr> <tr> <th scope="row" width="48%">bid:</th> <td class="yfnc_tabledata1"> <span id="yfs_b00_lux.mi">61.20</span> </td> </tr> <tr> <th scope="row" width="48%">ask:</th> <td class="yfnc_tabledata1"> <span id="yfs_a00_lux.mi">61.30</span> </td> </tr> <tr> <th scope="row" width="48%">1y target est:</th> <td class="yfnc_tabledata1">n/a</td> </tr><tr><th scope="row" width="48%">beta:</th> <td class="yfnc_tabledata1">n/a</td> </tr> <tr> <th scope="row" width="54%">next earnings date:</th> <td class="yfnc_tabledata1">n/a</td> </tr> </table> <table id="table2"> <tr> <th scope="row" width="48%">day's range:</th> <td class="yfnc_tabledata1"> <span> <span id="yfs_g53_lux.mi">60.75</span> </span> - <span> <span id="yfs_h53_lux.mi">61.60</span> </span> </td> </tr> <tr> <th scope="row" width="48%">52wk range:</th> <td class="yfnc_tabledata1"> <span>34.74</span> - <span>62.50</span> </td> </tr> <tr> <th scope="row" width="48%">volume:</th> <td class="yfnc_tabledata1"> <span id="yfs_v53_lux.mi">1,057,884</span> </td> </tr> <tr> <th scope="row" width="48%">avg vol <span class="small">(3m)</span> :</th> <td class="yfnc_tabledata1">740,908</td> </tr> <tr> <th scope="row" width="48%">market cap:</th> <td class="yfnc_tabledata1" ><span id="yfs_j10_lux.mi">29.36b</span> </td> </tr> <tr> <th scope="row" width="48%">p/e <span class="small">(ttm)</span> :</th> <td class="yfnc_tabledata1">42.28</td> </tr> <tr> <th scope="row" width="48%">eps <span class="small">(ttm)</span> :</th> <td class="yfnc_tabledata1">1.45</td> </tr> <tr class="end"><th scope="row" width="48%">div & yield:</th> <td class="yfnc_tabledata1">n/a (n/a) </td> </tr> </table> </div> </div>
edit1:
i found why same snippet doesn't work on both pages. snippet works fine historical prices page, , if try retrieve html of page, can see it's same exact html see in view-source of page in chrome. same doesn't happen summary page: when try retrieve html, has nothing view-source in chrome, it's retrievin different, can't tell exactly. question is: how come snippet not retrieving correct html of page?
document document = jsoup.connect("http://finance.yahoo.com/q?s=lux.mi").get(); temp = document.html();
solution found op:
edit2- solution: in case happens else: if jsoup.connect(url).get() reason doesn't retrieve correct page, first html in string without using jsoup, , after parse string jsoup.
httpclient httpclient = new defaulthttpclient(); httpget httpget = new httpget("http://google.com"); httpresponse response = httpclient.execute(httpget); httpentity entity = response.getentity(); inputstream = entity.getcontent(); bufferedreader reader = new bufferedreader(new inputstreamreader(is, "iso-8859-1"), 8); stringbuilder sb = new stringbuilder(); string line = null; while ((line = reader.readline()) != null) sb.append(line + "\n"); string html = sb.tostring(); is.close(); document document = jsoup.parse(html);
Comments
Post a Comment