WordCount(HelloWorld) MapReduce program

I am learning about MapReduce and in order to experiment with MapReduce, i created this simple program which takes a text file as input and then generate a output that prints how frequently a word appeared in the text file. You can download the source code for the program from here
  1. First i started by creating a simple Mapper which receives the content of the text file one line at a time, the Mapper takes care of splitting the content into words and then it writes every word into output and sets frequency count for that word to 1, by calling context.write(word,one). In this case the word becomes key and count becomes value
  2. Next i had to develop a Reducer class which, receives word as key and value is list of all the counts for example if your input file is simple text like aaa bbb ccc aaa, then reduce class will get called with aaa - [1, 1], bbb -[1] and ccc - [1] as input. Hadoop framework takes care of collecting output of Mapper and then converting it into key -[value,value] format. In the reducer only thing that i had to do was to iterate through all the values and come up with a count. Once i have that i write it as output of Reducer by calling context.write(key, new IntWritable(sum));
  3. The last part is creating WordCountDriver.java, which is a Java program that sets up Hadoop Framework by setting up inputs, defining outputs and also specifying name of the Mapper and Reducer class. After initializing Hadoop it calls job.waitForCompletion(true), this method will take care of passing the control to Hadoop framework and wait for the job to complete
  4. Now you can either use one of the existing .txt file on your machine or you can create a text file like this
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa
  5. Last step is to run your Hadoop program, if you used the Eclipse or some other IDE for developing your code, you can run your program directly by running WordCountDriver.java directly. This program takes 2 parameters, in my case since the input file is on local file system and i want the output to get stored on local file system too, i pass following 2 parameters
    file:///Users/sunil/hadoop/sorttest.txt file:///Users/sunil/hadoop/output/wordcount
  6. Once the program is finished successfully, you would be able to see part-r-00000 file created on your local machine at /Users/sunil/hadoop/output/wordcount, if you open it you should see output like this
    XXX     3
    YYY     3
    ZZZ     3
    aaa     10
    bbb     5
    ccc     5
    ddd     5
    eee     5
    fff     5
    ggg     5
    hhh     5
    iii     5
If you want to run this program with bigger text file then you can download few good classical books from Algorithm site data section


Revanth Reddy said...

Really it was a good example .helped me a lot as a beginer .

Steve Hawks said...

There are lots of information about latest technology and how to get trained in them, like Best Hadoop Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Best hadoop training institute in chennai). By the way you are running a great blog. Thanks for sharing this.

Big Data Course in Chennai | Big Data Training Chennai

Ritesh said...

Thanks Sunil,

Please explain where I need to execute Mapper and Reducer program??? Is there any type of utility where I will execute ???

Mathew Stephen said...

The way you have explained about the latest technology was really impressive. Thanks for sharing this useful content in here.

Salesforce course in chennai
Salesforce course in chennai

Mervin Parmar said...

Hadoop is one of the best cloud based tool for analysisng the big data. With the increase in the usage of big data there is a quite a demand for hadoop professionals.
Big data training in Chennai | Hadoop training Chennai | Hadoop training in Chennai

Bay Max said...

Really great post.Thanks for sharing this blog.It helps me to get a good job.Keep sharing.

Hadoop Training chennai
| Hadoop Training in chennai

Arjun kumar said...

Excellent article. Hadoop is a cloud based tool.It give more information about massive storage and it helps to improve our skills. Hadoop provides more job opportunities.To achieve a great career join with us.
Hadoop Training Chennai|Big Data Training Chennai

Arthur Mac said...

Big data is the next big thing in the information technology space. According to a recent survey there is a huge demand for professional big data analysts who are capable of processing large data so that the enterprise objective are met. Join Fita and get trained from the professional big data analysts who are working for corporates. Join FITA and stay ahead in your career.
Hadoop Training in Chennai | Bigdata Training in Chennai

John Alert said...

I have read your blog its very attractive and impressive. I like it your blog.

JavaEE Training in Chennai JavaEE Training in Chennai

Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

Java Online Training Java Online Training Core Java 8 Training in Chennai Java 8 Training in Chennai

Nandini Sharma said...

Thanks for giving Good Example. Fantastic article, Viral. Very well written, clear and concise. One of the best links explaining one to many and hierarchy in Hadoop.
Big data Hadoop Training

apttree said...

for preparing bank exam and group exam , we are offering an online test model questions papers

Bank Exam Questions and Answers

Group Exam Questions and Answers

Paul Miller said...

Excellent post!!! Java is most popular and efficient programming language available in the market today. It helps developers to create stunning desktop/web applications loaded with stunning functionalities. Java Course in Chennai | Best JAVA Training in Chennai

Rekha J said...

This is my first visit to your blog, your post made productive reading, thank you. dot net training in chennai

Saranya said...

This blog is very informative about the concepts involved in hadoop and its scope in future. Interesting concepts on its architectute and syllabus which are covered by big data hadoop training institute in Chennai that is functioning effectively.

yasar said...

very nice and informative blog
big data projects chennai
mobile computing projects chennai
cloud computing projects chennai
secure computing projects chennai

apto inn said...

You post explain everything in detail and it was very interesting to read. Thank you. nata coaching centres in chennai

Shreeja K said...

Informative article, just what I was looking for.seo services chennai

Shanayashrma said...

Great!! This is such an informative content. It will help for Beginner. Keep it up.

Best Tally Developer Training in Delhi
Best Tally ERP 9 Training in Delhi

Aptron said...

Thanks for sharing such a great information..Its really nice and informative..

Embedded System Training Institute in Delhi
Best Solidworks Training in Delhi
CATIA Training Institutes in Delhi

pavitha vinu said...

In order to write MapReduce applications you need to have an understanding of how data is transformed as it executes in the MapReduce framework.
Java Certification Training in Chennai

Melisa said...

The expansion of internet and intelligence in business process lead the way to huge volume of data. It is important to maintain and process these data to be efficient in data handling. Hadoop Training in Chennai | Big Data Training in Chennai