WordCount(HelloWorld) MapReduce program

I am learning about MapReduce and in order to experiment with MapReduce, i created this simple program which takes a text file as input and then generate a output that prints how frequently a word appeared in the text file. You can download the source code for the program from here
  1. First i started by creating a simple Mapper which receives the content of the text file one line at a time, the Mapper takes care of splitting the content into words and then it writes every word into output and sets frequency count for that word to 1, by calling context.write(word,one). In this case the word becomes key and count becomes value
  2. Next i had to develop a Reducer class which, receives word as key and value is list of all the counts for example if your input file is simple text like aaa bbb ccc aaa, then reduce class will get called with aaa - [1, 1], bbb -[1] and ccc - [1] as input. Hadoop framework takes care of collecting output of Mapper and then converting it into key -[value,value] format. In the reducer only thing that i had to do was to iterate through all the values and come up with a count. Once i have that i write it as output of Reducer by calling context.write(key, new IntWritable(sum));
  3. The last part is creating WordCountDriver.java, which is a Java program that sets up Hadoop Framework by setting up inputs, defining outputs and also specifying name of the Mapper and Reducer class. After initializing Hadoop it calls job.waitForCompletion(true), this method will take care of passing the control to Hadoop framework and wait for the job to complete
  4. Now you can either use one of the existing .txt file on your machine or you can create a text file like this
    
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa
    XXX YYY ZZZ
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    XXX YYY ZZZ
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    XXX YYY ZZZ
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa
    
  5. Last step is to run your Hadoop program, if you used the Eclipse or some other IDE for developing your code, you can run your program directly by running WordCountDriver.java directly. This program takes 2 parameters, in my case since the input file is on local file system and i want the output to get stored on local file system too, i pass following 2 parameters
    
    file:///Users/sunil/hadoop/sorttest.txt file:///Users/sunil/hadoop/output/wordcount
    
  6. Once the program is finished successfully, you would be able to see part-r-00000 file created on your local machine at /Users/sunil/hadoop/output/wordcount, if you open it you should see output like this
    
    XXX     3
    YYY     3
    ZZZ     3
    aaa     10
    bbb     5
    ccc     5
    ddd     5
    eee     5
    fff     5
    ggg     5
    hhh     5
    iii     5
    
If you want to run this program with bigger text file then you can download few good classical books from Algorithm site data section

40 comments:

  1. Really it was a good example .helped me a lot as a beginer .

    ReplyDelete
  2. There are lots of information about latest technology and how to get trained in them, like Best Hadoop Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Best hadoop training institute in chennai). By the way you are running a great blog. Thanks for sharing this.

    Big Data Course in Chennai | Big Data Training Chennai

    ReplyDelete
  3. Thanks Sunil,

    Please explain where I need to execute Mapper and Reducer program??? Is there any type of utility where I will execute ???

    ReplyDelete
  4. Really great post.Thanks for sharing this blog.It helps me to get a good job.Keep sharing.
    Regards

    Hadoop Training chennai
    | Hadoop Training in chennai

    ReplyDelete
  5. Excellent article. Hadoop is a cloud based tool.It give more information about massive storage and it helps to improve our skills. Hadoop provides more job opportunities.To achieve a great career join with us.
    Regards,
    Arjun
    Hadoop Training Chennai|Big Data Training Chennai

    ReplyDelete
  6. Big data is the next big thing in the information technology space. According to a recent survey there is a huge demand for professional big data analysts who are capable of processing large data so that the enterprise objective are met. Join Fita and get trained from the professional big data analysts who are working for corporates. Join FITA and stay ahead in your career.
    Thanks,
    Hadoop Training in Chennai | Bigdata Training in Chennai

    ReplyDelete
  7. Thanks for giving Good Example. Fantastic article, Viral. Very well written, clear and concise. One of the best links explaining one to many and hierarchy in Hadoop.
    Big data Hadoop Training

    ReplyDelete
  8. Great!! This is such an informative content. It will help for Beginner. Keep it up.

    Best Tally Developer Training in Delhi
    Best Tally ERP 9 Training in Delhi

    ReplyDelete
  9. In order to write MapReduce applications you need to have an understanding of how data is transformed as it executes in the MapReduce framework.
    Java Certification Training in Chennai

    ReplyDelete
  10. Learned a lot of new things from your post! Good creation and HATS OFF to the creativity of your mind. Very interesting and useful blog!
    Java Training in Chennai
    Java Course in Chennai
    Best Java Training in Chennai

    ReplyDelete
  11. Hi ,Your post on MapReduce Program was easy to understand I used the codes with little modifications,thanks for your post Hadoop Training in Velachery | Hadoop Training .

    ReplyDelete
  12. Hello,
    Great post! I am see the programming coding and step by step execute the outputs.I am gather this coding more information. It's helpful for me. Also great blog here with all of the valuable information you have. Know More Info About The Best MapReduce Certification.

    ReplyDelete
  13. What a fantastic read on Hadoop. This has helped me understand a lot in Hadoop course. Please keep sharing similar write ups on Hadoop. Guys if you are keen to knw more on Hadoop, must check this wonderful Hadoop tutorial and i'm sure you will enjoy learning on Hadoop training.https://www.youtube.com/watch?v=1jMR4cHBwZE

    ReplyDelete
  14. Much obliged for sharing such an awesome information.Its extremely pleasant and useful.
    Education | Article Submission sites | Technology

    ReplyDelete
  15. Great and useful blog admin, I would like to read more. Your step by step coding is really understandable. Continue sharing more like this.
    Hadoop Training Chennai | Big Data Training Chennai | Best Hadoop Training in Chennai

    ReplyDelete
  16. very informative blog and useful article thank you for sharing with us, keep posting Big data hadoop online Course

    ReplyDelete
  17. Thanks for sharing the valuable information, keep sharing.
    Regards,
    Hadoop Training Chennai|Big Data Training in Chennai

    ReplyDelete
  18. Very clear explanation. Please share more like that..


    RPA Training in Hyderabad

    ReplyDelete
  19. Very Impressive Big Data Hadoop tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Big Data Hadoop course. I'm also a learner taken up Big Data Hadoop Tutorial and I think your content has cleared some concepts of mine. While browsing for Hadoop tutorials on YouTube i found this fantastic video on Big Data Hadoop Tutorial.Do check it out if you are interested to know more.https://www.youtube.com/watch?v=nuPp-TiEeeQ&

    ReplyDelete
  20. The great article..thanks for sharing nice program information..
    Tally Course in Delhi...

    ReplyDelete
  21. I just like the helpful information you provide in your articles. I will bookmark your blog and take a look at once more here regularly.
    I am somewhat certain I’ll be informed plenty of new stuff right here! Good luck for the following!

    ReplyDelete
  22. hey...It is highly comprehensive and elaborated. Thanks for sharing!

    Localebazar- Your single guide for exploring delicious foods, travel diaries and fitness stories.

    Visit us for more- localebazar.com

    ReplyDelete
  23. Nice post, I would like to see more articles/blogs. I am also a content writer and writing a blog you can review it. immigration consultants in Delhi

    ReplyDelete
  24. Excellent article. Hadoop is a cloud based tool.It give more information about massive storage and it helps to improve our skills. Hadoop provides more job opportunities.To achieve a great career join with us.
    Regards,
    java training in chennai

    java training in omr

    aws training in chennai

    aws training in omr

    python training in chennai

    python training in omr

    selenium training in chennai

    selenium training in omr

    ReplyDelete
  25. Its such as you learn my mind! You appeаr tо grasp ѕo much approximately this, such as you wrote the book in it or something.
    I think that you could ɗo wіth some percent to pressure the mesѕage home a little bit,
    but instead of that, this iѕ excellent blog. An excellent
    read. I ԝilⅼ defіnitely be back.
    sap training in chennai

    sap training in velachery

    azure training in chennai

    azure training in velachery

    cyber security course in chennai

    cyber security course in velachery

    ethical hacking course in chennai

    ethical hacking course in velachery

    ReplyDelete
  26. Wow! Such an amazing and helpful post this is. I really really love it. I hope that you continue to do your work like this in the future also.

    Hadoop Training Institute in Pune
    Hadoop Administration training institutes in Pune

    ReplyDelete
  27. This comment has been removed by the author.

    ReplyDelete
  28. I really happy found this website eventually. Really informative and inoperative, Thanks for the post and effort! Please keep sharing more such blog.

    DevOps Course in Pune

    ReplyDelete
  29. Fine way of telling, and pleasant post. Nice info! Thanks a lot for sharing it, that’s truly has added a lot to our knowledge about this topic. Have a more successful day. Amazing write-up, always find something interesting.
    Thanks

    ReplyDelete
  30. Nice! your blog contained very useful information for us! You explained the map reduce method very well with programmes. best python training course in delhi

    ReplyDelete
  31. I am so grateful for your article.
    ETL Testing Online Training
    Microservices Online Training
    <a href="https://viswaonlinetrainings.com/courses/oracle-sql-and-plsql-training/>Oracle SQL&PLSQL Online Training</a>

    ReplyDelete
  32. Great job! Your blog provided incredibly valuable insights for us! The way you elucidated the map reduction technique alongside practical examples was truly enlightening.
    Vist: https://www.thinkcyberindia.com/

    ReplyDelete
  33. Wow, what an incredibly informative article!
    UI UX Design Schhol

    ReplyDelete