Error writing data to CSV due to ascii error in Python -
import requests bs4 import beautifulsoup import csv urlparse import urljoin import urllib2 base_url = 'http://www.baseball-reference.com' data = requests.get("http://www.baseball-reference.com/teams/bal/2014-schedule-scores.shtml") soup = beautifulsoup(data.content) outfile = open("./balpbp.csv", "wb") writer = csv.writer(outfile) url = [] link in soup.find_all('a'): if not link.has_attr('href'): continue if link.get_text() != 'boxscore': continue url.append(base_url + link['href']) list in url: response = requests.get(list) html = response.content soup = beautifulsoup(html) table = soup.find('table', attrs={'id': 'play_by_play'}) list_of_rows = [] row in table.findall('tr'): list_of_cells = [] cell in row.findall('td'): text = cell.text.replace(' ', '') list_of_cells.append(text) list_of_rows.append(list_of_cells) writer.writerows(list_of_rows) u'g.\xa0holland', u'n.\xa0cruz'...
here error message:
traceback (most recent call last): file "try.py", line 40, in <module> writer.writerows(list_of_rows) unicodeencodeerror: 'ascii' codec can't encode character u'\xa0' in position 57: ordinal not in range(128) when write data csv end data contains \x... stuff in data pieces prevents data being written csv. how change data delete part of data or circumvent issue?
you cannot use unicode csv module python2, need encode strings:
note
this version of csv module doesn’t support unicode input. also, there issues regarding ascii nul characters. accordingly, input should utf-8 or printable ascii safe; see examples in section examples.
text = cell.text.replace(' ', '').encode("utf-8") output after encoding:
top of 1st, red sox batting, tied 0-0, orioles' chris tillman facing 1-2-3 " t1,0-0,0,---,"7,(2-2) cbbfffx",o,bos,d. nava,c. tillman,2%,52%,groundout: p-1b (p's right) t1,0-0,1,---,"4,(1-2) bcfx",,bos,d. pedroia,c. tillman,-2%,50%,single rf (line drive short rf) t1,0-0,1,1--,"5,(1-2) cfbft",o,bos,d. ortiz,c. tillman,3%,52%,strikeout swinging t1,0-0,2,1--,"4,(0-2) c1cfs",o,bos,m. napoli,c. tillman,2%,55%,strikeout swinging ,,,,,,,,,"0 runs, 1 hit, 0 errors, 1 lob. red sox 0, orioles 0." "bottom of 1st, orioles batting, tied 0-0, red sox' jon lester facing 1-2-3 " b1,0-0,0,---,"4,(1-2) cbfx",o,bal,n. markakis,j. lester,-2%,52%,groundout: 3b-1b (weak 3b) b1,0-0,1,---,"6,(3-2) bbffbx",,bal,j. hardy,j. lester,2%,55%,single lf (line drive) b1,0-0,1,1--,"4,(1-2) fbsx",o,bal,a. jones,j. lester,-3%,52%,popfly: ss (deep ss) b1,0-0,2,1--,"5,(1-2) ffbfs",o,bal,c. davis,j. lester,-2%,50%,strikeout swinging ....................................
Comments
Post a Comment