hadoop - Add Spark to Oozie shared lib -
by default, oozie shared lib directory provides libraries hive, pig, , map-reduce. if want run spark job on oozie, might better add spark lib jars oozie's shared lib instead of copy them app's lib directory.
how can add spark lib jars (including spark-core , dependencies) oozie's shared lib? comment / answer appreciated.
spark action scheduled released oozie 4.2.0, though doc seems bit behind. see related jira here : oozie jira - add spark action executor
cloudera's release cdh 5.4 has though, see official doc here: cdh 5.4 oozie doc - oozie spark action extension
with older version of oozie, jars shared various approaches. first approach may work best. complete listings anyway :
below various ways include jar workflow:
set oozie.libpath=/path/to/jars,another/path/to/jars in job.properties.
this useful if have many workflows need same jar; can put in 1 place in hdfs , use many workflows. jars available actions in workflow. there no need ever point @ sharelib location. (i see in lot of workflows.) oozie knows sharelib , include automatically if set oozie.use.system.libpath=true in job.properties.
create directory named “lib” next workflow.xml in hdfs , put jars in there.
this useful if have jars need 1 workflow. oozie automatically make jars available actions in workflow.
specify tag in action path single jar; can have multiple tags.
this useful if want jars specific action , not actions in workflow. downside have specify them in workflow.xml, if ever need add/remove jars, have change workflow.xml.
add jars sharelib (e.g. /user/oozie/share/lib/lib_/pig)
while work, it’s not recommended 2 reasons: additional jars included every workflow using sharelib, may unexpected workflows , users. when upgrading sharelib, you’ll have recopy additional jars new sharelib.
quoted rober kanter's blog here : how-to: use sharelib in apache oozie (cdh 5)
Comments
Post a Comment