How to use Hadoop's InputFormat and OutputFormat in Spark

One of the things that i like about Spark is, that it allows you to use you MapReduce based InputFormat and OutputFormats for reading from and writing to. I wanted to try this i built the InputFormatOutputDriver class, that uses TextInputFormat for reading a file. Then uses that input to perform word count and finally uses TextOutputFormat for storing output As you can see most of the code is similar to WordCount program built using Apache Spark in Java , with difference that this is written in scala and following 2 lines When you want to use Hadoop API for reading data you should use sparkContext.newAPIHadoopFile() method, i am using version of the method that takes 4 parameters. First is path of input file, second parameter is the InputFormat class you want to use (I want to read file as Key - Value pair so i am using KeyValueTextInputFormat), then the next parameters is type of Key and Type of value, its Text for both key and value in my example and the last. Spark will read the file into a PairRDD[Text,Text], since i am only interested in the content of the file i am iterating through the keys and converting them from Text to String

val lines = sparkContext.newAPIHadoopFile(inputFile,classOf[KeyValueTextInputFormat], 
classOf[Text],classOf[Text]).keys.map(lineText => lineText.toString)
Once i have RDD[String] i can perform wordcount with it. But once the results are ready i am calling wordCountRDD.saveAsNewAPIHadoopFile() for storing data in Hadoop using TextOutputFormat.

wordCountRDD.saveAsNewAPIHadoopFile(outputFile,classOf[Text],classOf[IntWritable],
classOf[TextOutputFormat[Text,IntWritable]])

4 comments:

Anonymous said...

Thanks for helping me to understand basic Hadoop outputformat concepts. As a beginner in Hadoop your post help me a lot.
Hadoop Training in Velachery | Hadoop Training .
Hadoop Training in Chennai | Hadoop .

srjwebsolutions said...


We are leading responsive website designing and development company in Noida.
We are offering mobile friendly responsive website designing, website development, e-commerce website, seo service and sem services in Noida.

Responsive Website Designing Company in Noida
Website Designing Company in Noida
SEO Services in Noida
SMO Services in Noida

Vikas Chaudhary said...

Battery Mantra is Authorized exide car battery dealer in Noida and Greater Noida. We are providing our service in Indirapuram, Delhi, Ashok Nagar.

Exide Battery Dealer in Noida
Battery Dealer in Noida
Authorized Battery Dealer in Noida
Car Battery Dealer in Noida
Car Battery Dealer
Exide Battery Dealer

EG MEDI said...

Egmedi.com is online medical store pharmacy in laxmi nagar Delhi. You can Order prescription/OTC medicines online. Cash on Delivery available. Free Home Delivery


Online Pharmacy in Delhi
Buy Online medicine in Delhi
Online Pharmacy in laxmi nagar
Buy Online medicine in laxmi nagar
Onine Medical Store in Delhi
Online Medical store in laxmi nagar
Online medicine store in delhi
online medicine store in laxmi nagar
Purchase Medicine Online
Online Pharmacy India
Online Medical Store