perl - How to make the output from Text::CSV utf8? -
i have csv file, win.csv, text encoded in windows-1252. first use iconv make in utf8.
$iconv -o test.csv -f windows-1252 -t utf-8 win.csv then read converted csv file following perl script (utfcsv.pl).
#!/usr/bin/perl use utf8; use text::csv; use encode::detect::detector; $csv = text::csv->new({ binary => 1, sep_char => ';',}); open $fh, "<encoding(utf8)", "test.csv"; while (my $row = $csv->getline($fh)) { $line = join " ", @$row; $enc = encode::detect::detector::detect($line); print "($enc) $line\n"; } $csv->eof || $csv->error_diag(); close $fh; $csv->eol("\r\n"); exit; then output following.
(uft-8) ......... () ..... namely encoding of lines detected utf-8 (or ascii). actual output not seem utf-8. in fact, if save output on file
$./utfcsv.pl > output.txt then encoding of output.txt detected windows-1252.
question: how can output text in uft-8?
notes:
- environment: opensuse 13.2 x86_64, perl 5.20.1
- i not use text::csv::encoded because installation fails. (because test.csv converted in utf-8, strange use text::csv::encoded.)
- i use following script check encoding. (i use find out encoding of initial csv file win.csv.)
.
#!/usr/bin/perl use encode::detect::detector; open $in, "<","$argv[0]" || die "open failed"; while (my $line = <$in>) { $enc = encode::detect::detector::detect($line); chomp $enc; if ($enc) { print "$enc\n"; } }
you have set encoding of input file handle (which, way, should <:encoding(utf8) -- note colon) haven't specified encoding of output channel, perl send unencoded character values output
the unicode values characters fit in single byte -- basic latin (ascii) between 0 , 0x7f, , latin-1 supplement between 0x80 , 0xff -- similar windows code page 1252. in particular small letter u diaresis 0xfc in both unicode , cp1252, text cp1252 if output unencoded, instead of two-byte sequence 0xc3 0xbc same codepoint encoded in utf-8
if use binmode on stdout set encoding data output correctly, simplest use open pragma this
use open qw/ :std :encoding(utf-8) /; which set encoding stdin, stdout , stderr, newly-opened file handles. means don't have specify when open csv file, , code this
note have added use strict , use warnings, essential in perl program. have used autodie remove need checks on status of io operations, , have taken advantage of way perl interpolates arrays inside double quotes putting space between elements avoids need join call
#!/usr/bin/perl use utf8; use strict; use warnings 'all'; use open qw/ :std :encoding(utf-8) /; use autodie; use text::csv; $csv = text::csv->new({ binary => 1, sep_char => ';' }); open $fh, '<', 'test.csv'; while ( $row = $csv->getline($fh) ) { print "@$row\n"; } close $fh;
Comments
Post a Comment