apache pig - FLATTEN On bag NOT working as expected -
input : a.csv file having map data
[banks#{(bofa),(chase)}]
pig script :
a = load 'a.csv' (bank_details:map[]); b = foreach generate flatten(bank_details#'banks') bank_name;
output : b :
({(bofa),(chase)})
applying flatten on bag
c = foreach generate bank_details#'banks' banks: bag{t:(bank:chararray)}; d = foreach c generate flatten(banks);
output : d :
org.apache.pig.backend.executionengine.execexception: error 0: exception while executing [poproject (name: project[bag][0] - scope-114 operator key: scope-114) children: null @ []]: java.lang.classcastexception: org.apache.pig.data.databytearray cannot cast org.apache.pig.data.databag @ org.apache.pig.backend.hadoop.executionengine.physicallayer.physicaloperator.getnext(physicaloperator.java:366)
expected output :
(bofa) (chase)
if input file has got bag below :
input : a.csv
{(bofa),(chase)}
pig script :
a = load 'a.csv' (bank_details:bag{t:(bank_name:chararray)}); b = foreach generate flatten(bank_details) bank_name;
output : b : generating flattened result
(bofa) (chase)
any inputs on why not able flatten bag in alias c , d.
the problem here when not specify schema map
, defaults bytearray
, can see in official documentation:
a = load 'a.csv' (bank_details:map[]); b = foreach generate flatten(bank_details#'banks') bank_name; describe b; b: {bank_name: bytearray}
therefore, when try cast bag
result in classcastexception
because databytearray
cannot cast databag
. if perform dump
on c
still work because not doing real operation on data, merely projecting it. however, once call flatten
function expect receive databag
, , fail when trying cast bytearray
it.
the reason why works in second case correctly indicating schema map, bag
, won't default value, bytearray
:
a = load 'a.csv' (bank_details:bag{t:(bank_name:chararray)});
edit
sorry, didn't see in second case not using map
, using directly bag
. if want use map
, can long indicate schema avoid mentioned above:
a = load 'a.csv' (bank_details:map[{(name:chararray)}]); b = foreach generate flatten(bank_details#'banks') bank_name; dump b; (bofa) (chase)
Comments
Post a Comment