The MapReduce framwork uses instance of
org.apache.hadoop.mapreduce.Partitioner
class to figure our which mapreduce output key goes to which reducer. By default it uses
org.apache.hadoop.mapreduce.lib.partition.HashPartitioner
, this class calculates hash value for the key and divides it by number of Reducers in the program and uses remainder to figure out the reducer it goes to. This implementation is pretty good and as long as the keys generate hashCodes that gives uniform distribution it should be good.
But in some exceptional cases you might want to take control of how the output of Mapper gets distributed to Reducers. I just wanted to figure out how this works, so i decided to change my
WordCount(HelloWorld) MapReduce program to add a custom Partitioner that sends upper and lower case alphabets two 2 different reducers. I followed these steps
- First i did create a WordCountPartitioner.java class like this
First thing i am doing is checking if there are 2 reducers if yes i am using the first letter of the key to figure out if it starts with lower case letter (simply check it against 'a' letter, if yes send it to first reducer if not send it to second reducer
-
I had to make few changes in the Driver program to use my WordCountPartitioner
- job.setNumReduceTasks(2): This call is asking MapReduce framework to use 2 reducers
- job.setPartitionerClass(WordCountPartitioner.class); This call is setting my
WordCountPartitioner
as the class for partitioner
This screen shot shows how my sample.txt got divided into 2 reducer outputs. First 2 lines show output with default HashPartitioner and 2nd 2 lines show output when i used my custom Partitioner
Nice Article Sunil :)
ReplyDeleteGood Article !!
ReplyDeleteNicely explained.
well explained :) thank you bro :)
ReplyDeleteKeep going
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteReally helpful...Thanks
ReplyDeleteI just want to know about Map Reduce program and found this post this post is perfect one ,Thanks for sharing the informative post of Map Reduce and able to understand the concepts easily,Thoroughly enjoyed reading
ReplyDeleteCheck out the
https://www.credosystemz.com/training-in-chennai/best-hadoop-training-in-chennai/
hii that post on hadoop mapreduce was really good and informative do posting your blog as i regularly read your blog to clear my doubts Hadoop Training in Velachery | Hadoop Training .
ReplyDeleteHadoop Training in Chennai | Hadoop .
Thanks for info....
ReplyDeleteWebsite development in Bangalore