apache pig - FLATTEN On bag NOT working as expected -


input : a.csv file having map data

[banks#{(bofa),(chase)}] 

pig script :

a = load 'a.csv' (bank_details:map[]); b = foreach generate flatten(bank_details#'banks') bank_name; 

output : b :

({(bofa),(chase)}) 

applying flatten on bag

c = foreach generate bank_details#'banks' banks: bag{t:(bank:chararray)}; d = foreach c generate flatten(banks); 

output : d :

org.apache.pig.backend.executionengine.execexception: error 0: exception while executing [poproject (name: project[bag][0] - scope-114 operator key: scope-114) children: null @ []]: java.lang.classcastexception: org.apache.pig.data.databytearray cannot cast org.apache.pig.data.databag @ org.apache.pig.backend.hadoop.executionengine.physicallayer.physicaloperator.getnext(physicaloperator.java:366) 

expected output :

(bofa) (chase) 

if input file has got bag below :

input : a.csv

{(bofa),(chase)} 

pig script :

a = load 'a.csv' (bank_details:bag{t:(bank_name:chararray)}); b = foreach generate flatten(bank_details) bank_name; 

output : b : generating flattened result

(bofa) (chase) 

any inputs on why not able flatten bag in alias c , d.

the problem here when not specify schema map, defaults bytearray, can see in official documentation:

a = load 'a.csv' (bank_details:map[]); b = foreach generate flatten(bank_details#'banks') bank_name; describe b; b: {bank_name: bytearray} 

therefore, when try cast bag result in classcastexception because databytearray cannot cast databag. if perform dump on c still work because not doing real operation on data, merely projecting it. however, once call flatten function expect receive databag, , fail when trying cast bytearray it.

the reason why works in second case correctly indicating schema map, bag, won't default value, bytearray:

a = load 'a.csv' (bank_details:bag{t:(bank_name:chararray)}); 

edit

sorry, didn't see in second case not using map, using directly bag. if want use map, can long indicate schema avoid mentioned above:

a = load 'a.csv' (bank_details:map[{(name:chararray)}]); b = foreach generate flatten(bank_details#'banks') bank_name;  dump b; (bofa) (chase) 

Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -