python - PySpark Streaming example does not seem to terminate -
i trying understand python api of spark streaming simple example.
from pyspark.streaming import streamingcontext dvc = [[-0.1, -0.1], [0.1, 0.1], [1.1, 1.1], [0.9, 0.9]] dvc = [sc.parallelize(i, 1) in dvc] ssc = streamingcontext(sc, 2.0) input_stream = ssc.queuestream(dvc) def get_output(rdd): print(rdd.collect()) input_stream.foreachrdd(get_output) ssc.start() this prints the required output, prints lot of empty lists @ end , not terminate. can tell me might going wrong.
streaming in cases(unless terminated conditions in code) supposed infinite. purpose of streaming application consume data coming in @ regular intervals. , hence after processing first 4 rdds (i.e. [[-0.1, -0.1], [0.1, 0.1], [1.1, 1.1], [0.9, 0.9]]) have nothing in queue whereas spark streaming builds on notion new might come queuestream
if doing one-time etl might consider dropping streaming.
Comments
Post a Comment