Sunil's Notes: Creating custom Partitioner class for your mapreduce program

The MapReduce framwork uses instance of org.apache.hadoop.mapreduce.Partitioner class to figure our which mapreduce output key goes to which reducer. By default it uses org.apache.hadoop.mapreduce.lib.partition.HashPartitioner, this class calculates hash value for the key and divides it by number of Reducers in the program and uses remainder to figure out the reducer it goes to. This implementation is pretty good and as long as the keys generate hashCodes that gives uniform distribution it should be good. But in some exceptional cases you might want to take control of how the output of Mapper gets distributed to Reducers. I just wanted to figure out how this works, so i decided to change my WordCount(HelloWorld) MapReduce program to add a custom Partitioner that sends upper and lower case alphabets two 2 different reducers. I followed these steps

First i did create a WordCountPartitioner.java class like this First thing i am doing is checking if there are 2 reducers if yes i am using the first letter of the key to figure out if it starts with lower case letter (simply check it against 'a' letter, if yes send it to first reducer if not send it to second reducer
I had to make few changes in the Driver program to use my WordCountPartitioner
- job.setNumReduceTasks(2): This call is asking MapReduce framework to use 2 reducers
- job.setPartitionerClass(WordCountPartitioner.class); This call is setting my WordCountPartitioner as the class for partitioner

This screen shot shows how my sample.txt got divided into 2 reducer outputs. First 2 lines show output with default HashPartitioner and 2nd 2 lines show output when i used my custom Partitioner

9 comments:

Prashant said...: Nice Article Sunil :); June 17, 2015 at 9:13 PM
Unknown said...: Good Article !!
Nicely explained.; October 8, 2015 at 4:38 AM
sai said...: well explained :) thank you bro :); September 13, 2016 at 7:22 AM
karthik golagani said...: Keep going; September 19, 2016 at 11:45 AM
Vaishali infoways said...: This comment has been removed by the author.; October 16, 2016 at 1:02 AM
Vaishali infoways said...: Really helpful...Thanks; October 16, 2016 at 1:04 AM
Unknown said...: I just want to know about Map Reduce program and found this post this post is perfect one ,Thanks for sharing the informative post of Map Reduce and able to understand the concepts easily,Thoroughly enjoyed reading
Check out the
https://www.credosystemz.com/training-in-chennai/best-hadoop-training-in-chennai/; March 12, 2018 at 4:35 AM
Unknown said...: hii that post on hadoop mapreduce was really good and informative do posting your blog as i regularly read your blog to clear my doubts Hadoop Training in Velachery | Hadoop Training .
Hadoop Training in Chennai | Hadoop .; April 12, 2018 at 11:58 PM
Abhi said...: Thanks for info....
Website development in Bangalore; June 24, 2019 at 11:01 PM