Hello Spark Streaming

In the WordCount program built using Apache Spark in Java , i built simple Spark program that takes name of the file as input, reads the file and performs word count on the file. Now Spark also has concept of Spark Streaming which allows you to read file as stream of real time events instead of one time load of input file. But the API for transforming the data, in both cases Spark converts the input in RDD. In case of Spark Streaming it would convert the input events into Micro RDD which is nothing but collecting all the incoming data for certain duration (microBatchTime) and then exposes it as RDD. I built this simple NetcatStreamClient Streaming application that listens for incoming data on netcat port, once it has data it performs wordcount on it and prints that to console. You can download the full source code from GitHub

package com.spnotes.spark

import com.typesafe.scalalogging.Logger
import org.apache.spark.SparkConf
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.slf4j.LoggerFactory

object NetcatStreamClient{
  val logger = Logger(LoggerFactory.getLogger("NetcatStreamClient"))

  def main(argv:Array[String]): Unit ={
    logger.debug("Entering NetcatStreamClient.main")
    if(argv.length != 3){
      println("Please provide 3 parameters   ")
    val hostName =argv(0)
    val port = argv(1).toInt
    val microBatchTime = argv(2).toInt

    logger.debug(s"Listening on $hostName at $port batching records every $microBatchTime")

    //Create Spark Configuration
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
    //Create SparkStreamingContext with microBatchTime which specifies how long spark should collect data
    val sparkStreamingContext = new StreamingContext(sparkConf,Seconds(microBatchTime))
    //Start listening for data on given host and port
    val lines = sparkStreamingContext.socketTextStream(hostName,port)
// Logic for implementing word count on the input batch
    lines.flatMap(_.split(" ")).map(word => (word,1)).reduceByKey(_+_).print()
    logger.debug("Number of words " + lines.count())

    //Start the stream so that data starts flowing, you must define transformation logic before calling start()
    logger.debug("Exiting NetcatStreamClient.main")
Once your project is built using mvn clean compile assembly:single, first thing you should do is executing following command to start netcat on localhost at port 9999 nc -l 9999
Next execute following code to start Spark Driver that takes 3 parameters host and port where netcat is listening and last parameter is how the batch duration should be bin/spark-submit ~/HelloSparkStreaming-1.0-SNAPSHOT-jar-with-dependencies.jar localhost 9999 5


srjwebsolutions said...

We are leading responsive website designing and development company in Noida.
We are offering mobile friendly responsive website designing, website development, e-commerce website, seo service and sem services in Noida.

Responsive Website Designing Company in Noida
Website Designing Company in Noida
SEO Services in Noida
SMO Services in Noida

Vikas Chaudhary said...

Battery Mantra is Authorized exide car battery dealer in Noida and Greater Noida. We are providing our service in Indirapuram, Delhi, Ashok Nagar.

Exide Battery Dealer in Noida
Battery Dealer in Noida
Authorized Battery Dealer in Noida
Car Battery Dealer in Noida
Car Battery Dealer
Exide Battery Dealer

EG MEDI said...

Egmedi.com is online medical store pharmacy in laxmi nagar Delhi. You can Order prescription/OTC medicines online. Cash on Delivery available. Free Home Delivery

Online Pharmacy in Delhi
Buy Online medicine in Delhi
Online Pharmacy in laxmi nagar
Buy Online medicine in laxmi nagar
Onine Medical Store in Delhi
Online Medical store in laxmi nagar
Online medicine store in delhi
online medicine store in laxmi nagar
Purchase Medicine Online
Online Pharmacy India
Online Medical Store