Confused with Spark serilization -
i need read several csv files , convert several columns string double.
the code like:
def f(s:string):double = s.todouble def readonefile(path:string) = { val data = { line <- sc.textfile( path ) arr = line.split(",").map(_.trim) id = arr(33) } yield { val countings = ((9 14) map arr).tovector map f id -> countings.tovector } data }
if write todouble
explicitly (e.g. function f
in code), spark throws error java.io.ioexception
or java.lang.exceptionininitializererror
.
however if change countings
val countings = ((9 14) map arr).tovector map (_.todouble)
then works fine.
is function f
serializable?
edit:
some people says same task not serializable: java.io.notserializableexception when calling function outside closure on classes not objects why doesn't throw task not serializable
exception?
scala version 2.10
spark version 1.3.1
environment: yarn-client
we can move function f companion object. made transformations avoid loop i'm not sure doing want. note, might want use spark-csv instead of splitting on commas illustrates it:
object panda { def f(s:string):double = s.todouble } def readonefile(path:string) = { val input = sc.textfile( path ) arrs = input.map(line => line.split(",").map(_.trim)) arrrs.map(arr => (arr(33).todouble, ((9 14) map arr).map(panda.f).tovector) }
Comments
Post a Comment