Confused with Spark serilization -


i need read several csv files , convert several columns string double.

the code like:

  def f(s:string):double = s.todouble    def readonefile(path:string) = {     val data = {       line <-  sc.textfile( path )       arr = line.split(",").map(_.trim)       id = arr(33)     } yield {         val countings = ((9 14) map arr).tovector map f         id -> countings.tovector       }     data   } 

if write todouble explicitly (e.g. function f in code), spark throws error java.io.ioexception or java.lang.exceptionininitializererror.

however if change countings

val countings = ((9 14) map arr).tovector map (_.todouble)

then works fine.

is function f serializable?

edit:

some people says same task not serializable: java.io.notserializableexception when calling function outside closure on classes not objects why doesn't throw task not serializable exception?

scala version 2.10

spark version 1.3.1

environment: yarn-client

we can move function f companion object. made transformations avoid loop i'm not sure doing want. note, might want use spark-csv instead of splitting on commas illustrates it:

  object panda {     def f(s:string):double = s.todouble   }    def readonefile(path:string) = {       val input = sc.textfile( path )       arrs = input.map(line => line.split(",").map(_.trim))       arrrs.map(arr => (arr(33).todouble,                         ((9 14) map arr).map(panda.f).tovector)   } 

Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

php - CakePHP HttpSockets send array of paramms -

node.js - Using Node without global install -