Error writing data to CSV due to ascii error in Python -


import requests bs4 import beautifulsoup import csv urlparse import urljoin import urllib2   base_url = 'http://www.baseball-reference.com' data = requests.get("http://www.baseball-reference.com/teams/bal/2014-schedule-scores.shtml") soup = beautifulsoup(data.content) outfile = open("./balpbp.csv", "wb") writer = csv.writer(outfile)  url = [] link in soup.find_all('a'):      if not link.has_attr('href'):         continue      if link.get_text() != 'boxscore':         continue      url.append(base_url + link['href'])  list in url:     response = requests.get(list)     html = response.content     soup = beautifulsoup(html)       table = soup.find('table', attrs={'id': 'play_by_play'})      list_of_rows = []     row in table.findall('tr'):         list_of_cells = []         cell in row.findall('td'):             text = cell.text.replace(' ', '')             list_of_cells.append(text)         list_of_rows.append(list_of_cells)     writer.writerows(list_of_rows) 

u'g.\xa0holland', u'n.\xa0cruz'...

here error message:

traceback (most recent call last):   file "try.py", line 40, in <module>     writer.writerows(list_of_rows) unicodeencodeerror: 'ascii' codec can't encode character u'\xa0' in position 57: ordinal not in range(128) 

when write data csv end data contains \x... stuff in data pieces prevents data being written csv. how change data delete part of data or circumvent issue?

you cannot use unicode csv module python2, need encode strings:

note

this version of csv module doesn’t support unicode input. also, there issues regarding ascii nul characters. accordingly, input should utf-8 or printable ascii safe; see examples in section examples.

text = cell.text.replace('&nbsp;', '').encode("utf-8") 

output after encoding:

top of 1st, red sox batting, tied 0-0, orioles' chris tillman facing 1-2-3 " t1,0-0,0,---,"7,(2-2) cbbfffx",o,bos,d. nava,c. tillman,2%,52%,groundout: p-1b (p's right) t1,0-0,1,---,"4,(1-2) bcfx",,bos,d. pedroia,c. tillman,-2%,50%,single rf (line drive short rf) t1,0-0,1,1--,"5,(1-2) cfbft",o,bos,d. ortiz,c. tillman,3%,52%,strikeout swinging t1,0-0,2,1--,"4,(0-2) c1cfs",o,bos,m. napoli,c. tillman,2%,55%,strikeout swinging ,,,,,,,,,"0 runs, 1 hit, 0 errors, 1 lob. red sox 0, orioles 0." "bottom of 1st, orioles batting, tied 0-0, red sox' jon lester facing 1-2-3 " b1,0-0,0,---,"4,(1-2) cbfx",o,bal,n. markakis,j. lester,-2%,52%,groundout: 3b-1b (weak 3b) b1,0-0,1,---,"6,(3-2) bbffbx",,bal,j. hardy,j. lester,2%,55%,single lf (line drive) b1,0-0,1,1--,"4,(1-2) fbsx",o,bal,a. jones,j. lester,-3%,52%,popfly: ss (deep ss) b1,0-0,2,1--,"5,(1-2) ffbfs",o,bal,c. davis,j. lester,-2%,50%,strikeout swinging .................................... 

Comments

Popular posts from this blog

node.js - Using Node without global install -

How to access a php class file from PHPFox framework into javascript code written in simple HTML file? -

java - Null response to php query in android, even though php works properly -