perl - How to make the output from Text::CSV utf8? -


i have csv file, win.csv, text encoded in windows-1252. first use iconv make in utf8.

$iconv -o test.csv -f windows-1252 -t utf-8 win.csv 

then read converted csv file following perl script (utfcsv.pl).

#!/usr/bin/perl  use utf8; use text::csv; use encode::detect::detector;  $csv = text::csv->new({ binary => 1, sep_char => ';',}); open $fh, "<encoding(utf8)", "test.csv";  while (my $row = $csv->getline($fh)) {    $line = join " ", @$row;   $enc = encode::detect::detector::detect($line);   print "($enc) $line\n"; }  $csv->eof || $csv->error_diag(); close $fh; $csv->eol("\r\n"); exit; 

then output following.

(uft-8) ......... () ..... 

namely encoding of lines detected utf-8 (or ascii). actual output not seem utf-8. in fact, if save output on file

$./utfcsv.pl > output.txt 

then encoding of output.txt detected windows-1252.

question: how can output text in uft-8?

notes:

  1. environment: opensuse 13.2 x86_64, perl 5.20.1
  2. i not use text::csv::encoded because installation fails. (because test.csv converted in utf-8, strange use text::csv::encoded.)
  3. i use following script check encoding. (i use find out encoding of initial csv file win.csv.)

.

#!/usr/bin/perl  use encode::detect::detector; open $in,  "<","$argv[0]" || die "open failed"; while (my $line = <$in>) {   $enc = encode::detect::detector::detect($line);   chomp $enc;   if ($enc) {     print "$enc\n";   } } 

you have set encoding of input file handle (which, way, should <:encoding(utf8) -- note colon) haven't specified encoding of output channel, perl send unencoded character values output

the unicode values characters fit in single byte -- basic latin (ascii) between 0 , 0x7f, , latin-1 supplement between 0x80 , 0xff -- similar windows code page 1252. in particular small letter u diaresis 0xfc in both unicode , cp1252, text cp1252 if output unencoded, instead of two-byte sequence 0xc3 0xbc same codepoint encoded in utf-8

if use binmode on stdout set encoding data output correctly, simplest use open pragma this

use open qw/ :std :encoding(utf-8) /; 

which set encoding stdin, stdout , stderr, newly-opened file handles. means don't have specify when open csv file, , code this

note have added use strict , use warnings, essential in perl program. have used autodie remove need checks on status of io operations, , have taken advantage of way perl interpolates arrays inside double quotes putting space between elements avoids need join call

#!/usr/bin/perl  use utf8; use strict; use warnings 'all'; use open qw/ :std :encoding(utf-8) /; use autodie;  use text::csv;  $csv = text::csv->new({ binary => 1, sep_char => ';' });  open $fh, '<', 'test.csv';  while ( $row = $csv->getline($fh) ) {     print "@$row\n"; }  close $fh; 

Comments