hadoop - Spark insert to HBase slow -


i inserting hbase using spark it's slow. 60,000 records takes 2-3mins. have 10 million records save.

object writetohbase extends serializable {     def main(args: array[string]) {         val csvrows: rdd[array[string] = ...         val dateformatter = datetimeformat.forpattern("yyyy-mm-dd hh:mm:ss")         val usersrdd = csvrows.map(row => {             new usertable(row(0), row(1), row(2), row(9), row(10), row(11))         })         processusers(sc: sparkcontext, usersrdd, dateformatter)     }) }  def processusers(sc: sparkcontext, usersrdd: rdd[usertable], dateformatter: datetimeformatter): unit = {      usersrdd.foreachpartition(part => {         val conf = hbaseconfiguration.create()         val table = new htable(conf, tablename)          part.foreach(userrow => {             val id = userrow.id             val name = userrow.name             val date1 = dateformatter.parsedatetime(userrow.date1)             val hrow = new put(bytes.tobytes(id))             hrow.add(cf, q, bytes.tobytes(date1))             hrow.add(cf, q, bytes.tobytes(name))             ...             table.put(hrow)         })         table.flushcommits()         table.close()     }) } 

i using in spark-submit:

--num-executors 2 --driver-memory 2g --executor-memory 2g --executor-cores 2  

it's slow because implementation doesn't leverage proximity of data; piece of spark rdd in server may transferred hbase regionserver running on server.

currently there no spark's rrd operation use hbase data store in efficient manner.


Comments

Popular posts from this blog

node.js - Using Node without global install -

How to access a php class file from PHPFox framework into javascript code written in simple HTML file? -

java - Null response to php query in android, even though php works properly -