Creating custom Partitioner class for your mapreduce program

The MapReduce framwork uses instance of org.apache.hadoop.mapreduce.Partitioner class to figure our which mapreduce output key goes to which reducer. By default it uses org.apache.hadoop.mapreduce.lib.partition.HashPartitioner, this class calculates hash value for the key and divides it by number of Reducers in the program and uses remainder to figure out the reducer it goes to. This implementation is pretty good and as long as the keys generate hashCodes that gives uniform distribution it should be good. But in some exceptional cases you might want to take control of how the output of Mapper gets distributed to Reducers. I just wanted to figure out how this works, so i decided to change my WordCount(HelloWorld) MapReduce program to add a custom Partitioner that sends upper and lower case alphabets two 2 different reducers. I followed these steps
  1. First i did create a WordCountPartitioner.java class like this First thing i am doing is checking if there are 2 reducers if yes i am using the first letter of the key to figure out if it starts with lower case letter (simply check it against 'a' letter, if yes send it to first reducer if not send it to second reducer
  2. I had to make few changes in the Driver program to use my WordCountPartitioner
    • job.setNumReduceTasks(2): This call is asking MapReduce framework to use 2 reducers
    • job.setPartitionerClass(WordCountPartitioner.class); This call is setting my WordCountPartitioner as the class for partitioner
This screen shot shows how my sample.txt got divided into 2 reducer outputs. First 2 lines show output with default HashPartitioner and 2nd 2 lines show output when i used my custom Partitioner

9 comments:

  1. Nice Article Sunil :)

    ReplyDelete
  2. Good Article !!
    Nicely explained.

    ReplyDelete
  3. well explained :) thank you bro :)

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. I just want to know about Map Reduce program and found this post this post is perfect one ,Thanks for sharing the informative post of Map Reduce and able to understand the concepts easily,Thoroughly enjoyed reading
    Check out the
    https://www.credosystemz.com/training-in-chennai/best-hadoop-training-in-chennai/

    ReplyDelete
  6. hii that post on hadoop mapreduce was really good and informative do posting your blog as i regularly read your blog to clear my doubts Hadoop Training in Velachery | Hadoop Training .
    Hadoop Training in Chennai | Hadoop .

    ReplyDelete