Using output of the MapReduce program as input in another MapReduce program - KeyValueTextInputFormat

In the WordCount(HelloWorld) MapReduce program i blogged about how to create a MapReduce program that takes a text file as input and generates output which tells you frequency of each word in the input file. I wanted to take that a step further by reading the output generated by the first MapReduce and figure out which word is used most frequently and how many times that word is used. So i developed this HadoopWordCountProcessor program to do that.
  1. First take a look at the output generated by the HadoopWordCount program, which looks like this. In the HadoopWordCount program i used TextOutputFormat as output format class, this class generates output in which there is one key value pair on every line separated by tab character XXX 3 YYY 3 ZZZ 3 aaa 10 bbb 5 ccc 5 ddd 5 eee 5 fff 5 ggg 5 hhh 5 iii 5
  2. First create a WordCountProcessorMapper.java program like this, this class receives Text class as Key and value, Only thing i am doing here is converting the Text key into IntWritable and then writing it into output.
  3. The reducer class is the place where i am getting all the words as key and their frequency as value. In this class i am keeping track of highest frequency word (You will have to copy the key and value of highest occuring word into a local variable for it to work because hadoop reuses key and values object sent to reducer)
  4. The last step is to create a Driver class, note one thing about the Driver class, i am setting job.setInputFormatClass(KeyValueTextInputFormat.class);, in this i am setting KeyValueTextInputFormat as input format class. Once i do that hadoop takes care of reading the input and breaking it into key and value and passing to my Mapper class
  5. Next step is to execute the WordCountProcessor.java class with the output of the first MapReduce program as input by passing couple of arguments like this file:////Users/gpzpati/hadoop/output/wordcount file:///Users/gpzpati/hadoop/output/wordcount2 It will generate output like this. Which says aaa is the most frequently used word and it appeared 10 times aaa 10

11 comments:

Unknown said...

There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this.


Hadoop training velachery
Hadoop training in velachery

Anonymous said...

And what if there are joint contenders for the top position? E.g. fox 30 times and dog 30 times?

Ajay Raj said...

Nice Posting,....

Red Hat Linux Training in Chennai
Rhce Training in Chennai

vigneswaran said...

Pretty informed post! I'm seeking for some topics I need to see that our site affection and then drove it our site all report is really good.
Hadoop Training in Chennai
Hadoop Training Institute in Chennai
Best Hadoop Training in Chennai

Abhi said...

Thanks for info....
Website development in Bangalore

Precisionrx Telemed - For Men's Health said...

Erectile Dysfunction is the inability to get and keep an erection firm enough for s*x. Encountering erection trouble from time to time is not actually a justification concern. In case erectile brokenness is a ceaseless issue, in any case, it can cause pressure, impact your confidence, and add to relationship issues. Issue in getting or keeping an erection can in like manner be a sign of a medical problem that needs therapy. If anyone suffer from ED he always looking for on google like Trimix Injections, sildenafil 110 mg, sildenafil 110 mg troche, where to buy tadalafil online, ed injections trimix. If you are stressed over erectile brokenness, talk about with your primary care physician — whether or not you're embarrassed. Occasionally, treating a basic condition is adequate to turn around erectile brokenness. In various cases, solutions or other direct drugs might be required.

Charbhuja Tiles said...

This is very amazing information; I think this no need to be updated. If updated this information will get more value rather than before. Here we are from charbhujatiles we also have important and amazing detail or information for Tiles showroom Double charge tiles Paving tiles Bathroom wall tiles Best tiles for home Kitchen tile backsplash ideas. If you are interesting to know more about the tiles or marble can visit our website.

Sunil Bajaj Online Store said...

Some personal care tips: Do you know the employments of nail clean remover, aside from eliminating the hard layer of nail clean? Large numbers of you may not know that there are a few employments of the nail clean remover inside your home. From cleaning a scope of things to various surfaces at home, cleaning with nail cleanercan show extraordinary outcomes. It is also very important to know about the facial trayand highlighter brush price.

NextLevelTelemed said...

The most common ways to treat erectile dysfunction or ED is through penile injection, or what is commonly known as an ED injection. This method of treatment was created as an alternative to oral treatments and works by injecting medication into the corpus cavernosum of the penile tissue. The medication is then absorbed into the penile tissue where it dilates the blood vessels and releases nitric oxide. It is for this reason that penile injections are widely considered the most effective form of treatment for erectile dysfunction. ED injections are one of the most effective methods of treating erectile dysfunction. They work by increasing blood flow to the private part, allowing for a harder erection. buy trimix injections online, erectile dysfunction injection cost, trimix for erectile dysfunction, where to buy trimix.

PriyankUnikart said...

Thank you for sharing useful information with us. Please keep sharing. And if you are looking for a Unique & Best University in India, Please visit the following links:
MBA Import and Export Colleges in Visakhapatnam
MBA Project Management Colleges in Visakhapatnam
MBA Retail Management Colleges in Visakhapatnam
MBA Energy and Environment Colleges in Visakhapatnam
MBA Infrastructure Management Colleges in Visakhapatnam

Education Point said...


Nice Blog. It is very interesting and more information.a

Best B arch College in Gurgaon
B tech CSE Course Details