WordCount(HelloWorld) MapReduce program

I am learning about MapReduce and in order to experiment with MapReduce, i created this simple program which takes a text file as input and then generate a output that prints how frequently a word appeared in the text file. You can download the source code for the program from here
  1. First i started by creating a simple Mapper which receives the content of the text file one line at a time, the Mapper takes care of splitting the content into words and then it writes every word into output and sets frequency count for that word to 1, by calling context.write(word,one). In this case the word becomes key and count becomes value
  2. Next i had to develop a Reducer class which, receives word as key and value is list of all the counts for example if your input file is simple text like aaa bbb ccc aaa, then reduce class will get called with aaa - [1, 1], bbb -[1] and ccc - [1] as input. Hadoop framework takes care of collecting output of Mapper and then converting it into key -[value,value] format. In the reducer only thing that i had to do was to iterate through all the values and come up with a count. Once i have that i write it as output of Reducer by calling context.write(key, new IntWritable(sum));
  3. The last part is creating WordCountDriver.java, which is a Java program that sets up Hadoop Framework by setting up inputs, defining outputs and also specifying name of the Mapper and Reducer class. After initializing Hadoop it calls job.waitForCompletion(true), this method will take care of passing the control to Hadoop framework and wait for the job to complete
  4. Now you can either use one of the existing .txt file on your machine or you can create a text file like this
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa
  5. Last step is to run your Hadoop program, if you used the Eclipse or some other IDE for developing your code, you can run your program directly by running WordCountDriver.java directly. This program takes 2 parameters, in my case since the input file is on local file system and i want the output to get stored on local file system too, i pass following 2 parameters
    file:///Users/sunil/hadoop/sorttest.txt file:///Users/sunil/hadoop/output/wordcount
  6. Once the program is finished successfully, you would be able to see part-r-00000 file created on your local machine at /Users/sunil/hadoop/output/wordcount, if you open it you should see output like this
    XXX     3
    YYY     3
    ZZZ     3
    aaa     10
    bbb     5
    ccc     5
    ddd     5
    eee     5
    fff     5
    ggg     5
    hhh     5
    iii     5
If you want to run this program with bigger text file then you can download few good classical books from Algorithm site data section


Revanth Reddy said...

Really it was a good example .helped me a lot as a beginer .

Steve Hawks said...

There are lots of information about latest technology and how to get trained in them, like Best Hadoop Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Best hadoop training institute in chennai). By the way you are running a great blog. Thanks for sharing this.

Big Data Course in Chennai | Big Data Training Chennai

Anonymous said...

Thanks Sunil,

Please explain where I need to execute Mapper and Reducer program??? Is there any type of utility where I will execute ???

Mathew Stephen said...

The way you have explained about the latest technology was really impressive. Thanks for sharing this useful content in here.

Salesforce course in chennai
Salesforce course in chennai

Mervin Parmar said...

Hadoop is one of the best cloud based tool for analysisng the big data. With the increase in the usage of big data there is a quite a demand for hadoop professionals.
Big data training in Chennai | Hadoop training Chennai | Hadoop training in Chennai

Bay Max said...

Really great post.Thanks for sharing this blog.It helps me to get a good job.Keep sharing.

Hadoop Training chennai
| Hadoop Training in chennai

Arjun kumar said...

Excellent article. Hadoop is a cloud based tool.It give more information about massive storage and it helps to improve our skills. Hadoop provides more job opportunities.To achieve a great career join with us.
Hadoop Training Chennai|Big Data Training Chennai

Arthur Mac said...

Big data is the next big thing in the information technology space. According to a recent survey there is a huge demand for professional big data analysts who are capable of processing large data so that the enterprise objective are met. Join Fita and get trained from the professional big data analysts who are working for corporates. Join FITA and stay ahead in your career.
Hadoop Training in Chennai | Bigdata Training in Chennai

John Alert said...

I have read your blog its very attractive and impressive. I like it your blog.

JavaEE Training in Chennai JavaEE Training in Chennai

Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

Java Online Training Java Online Training Core Java 8 Training in Chennai Java 8 Training in Chennai

Nandini Sharma said...

Thanks for giving Good Example. Fantastic article, Viral. Very well written, clear and concise. One of the best links explaining one to many and hierarchy in Hadoop.
Big data Hadoop Training

apttree said...

for preparing bank exam and group exam , we are offering an online test model questions papers

Bank Exam Questions and Answers

Group Exam Questions and Answers

Paul Miller said...

Excellent post!!! Java is most popular and efficient programming language available in the market today. It helps developers to create stunning desktop/web applications loaded with stunning functionalities. Java Course in Chennai | Best JAVA Training in Chennai

Rekha J said...

This is my first visit to your blog, your post made productive reading, thank you. dot net training in chennai

Saranya said...

This blog is very informative about the concepts involved in hadoop and its scope in future. Interesting concepts on its architectute and syllabus which are covered by big data hadoop training institute in Chennai that is functioning effectively.

yasar said...

very nice and informative blog
big data projects chennai
mobile computing projects chennai
cloud computing projects chennai
secure computing projects chennai

apto inn said...

You post explain everything in detail and it was very interesting to read. Thank you. nata coaching centres in chennai

Shreeja K said...

Informative article, just what I was looking for.seo services chennai

Shanayashrma said...

Great!! This is such an informative content. It will help for Beginner. Keep it up.

Best Tally Developer Training in Delhi
Best Tally ERP 9 Training in Delhi

Aptron said...

Thanks for sharing such a great information..Its really nice and informative..

Embedded System Training Institute in Delhi
Best Solidworks Training in Delhi
CATIA Training Institutes in Delhi

pavitha vinu said...

In order to write MapReduce applications you need to have an understanding of how data is transformed as it executes in the MapReduce framework.
Java Certification Training in Chennai

Melisa said...

The expansion of internet and intelligence in business process lead the way to huge volume of data. It is important to maintain and process these data to be efficient in data handling. Hadoop Training in Chennai | Big Data Training in Chennai

kim john said...

Learned a lot of new things from your post! Good creation and HATS OFF to the creativity of your mind. Very interesting and useful blog!
Java Training in Chennai
Java Course in Chennai
Best Java Training in Chennai

Dharani said...

thanks for sharing
Best Linux Training Institute in Chennai

ari kesavan said...

I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.
hadoop training in chennai

nivedhitha reddy said...

very nice article Python & Hadoop training

Mir said...

good explaination about hadoop and map reduce ,
i found more resources where you can find tested source code of map reduce programs
refere this

top 10 map reduce program sources code

top 10 Read Write fs program using java api

top 30 hadoop shell commands

Dwayne Smith said...

Listing your business data on these free business listing sites will increase on-line exposure and provides new avenues to achieve potential customers.

High PR Business Directory 2018

Ankit Kumar said...

This paragraph gives clear idea for the new viewers of blogging, Thanks you .
MapReduce Training in Noida

sowmiya gopal said...

Hi ,Your post on MapReduce Program was easy to understand I used the codes with little modifications,thanks for your post Hadoop Training in Velachery | Hadoop Training .

gracy layla said...

Great post! I am see the programming coding and step by step execute the outputs.I am gather this coding more information. It's helpful for me. Also great blog here with all of the valuable information you have. Know More Info About The Best MapReduce Certification.

srjwebsolutions said...

We are leading responsive website designing and development company in Noida.
We are offering mobile friendly responsive website designing, website development, e-commerce website, seo service and sem services in Noida.

Responsive Website Designing Company in Noida
Website Designing Company in Noida
SEO Services in Noida
SMO Services in Noida

Vikas Chaudhary said...

Battery Mantra is Authorized exide car battery dealer in Noida and Greater Noida. We are providing our service in Indirapuram, Delhi, Ashok Nagar.

Exide Battery Dealer in Noida
Battery Dealer in Noida
Authorized Battery Dealer in Noida
Car Battery Dealer in Noida
Car Battery Dealer
Exide Battery Dealer

Sonam Jain said...

What a fantastic read on Hadoop. This has helped me understand a lot in Hadoop course. Please keep sharing similar write ups on Hadoop. Guys if you are keen to knw more on Hadoop, must check this wonderful Hadoop tutorial and i'm sure you will enjoy learning on Hadoop training.https://www.youtube.com/watch?v=1jMR4cHBwZE

EG MEDI said...

Egmedi.com is online medical store pharmacy in laxmi nagar Delhi. You can Order prescription/OTC medicines online. Cash on Delivery available. Free Home Delivery

Online Pharmacy in Delhi
Buy Online medicine in Delhi
Online Pharmacy in laxmi nagar
Buy Online medicine in laxmi nagar
Onine Medical Store in Delhi
Online Medical store in laxmi nagar
Online medicine store in delhi
online medicine store in laxmi nagar
Purchase Medicine Online
Online Pharmacy India
Online Medical Store

Kamal said...

Much obliged for sharing such an awesome information.Its extremely pleasant and useful.
Education | Article Submission sites | Technology

Priya Rajesh said...

Great and useful blog admin, I would like to read more. Your step by step coding is really understandable. Continue sharing more like this.
Hadoop Training Chennai | Big Data Training Chennai | Best Hadoop Training in Chennai

IT Tutorials said...

Thanks for your article. Its very helpful. Hadoop training in chennai | Hadoop Training institute in chennai