WordCount(HelloWorld) MapReduce program

I am learning about MapReduce and in order to experiment with MapReduce, i created this simple program which takes a text file as input and then generate a output that prints how frequently a word appeared in the text file. You can download the source code for the program from here
  1. First i started by creating a simple Mapper which receives the content of the text file one line at a time, the Mapper takes care of splitting the content into words and then it writes every word into output and sets frequency count for that word to 1, by calling context.write(word,one). In this case the word becomes key and count becomes value
  2. Next i had to develop a Reducer class which, receives word as key and value is list of all the counts for example if your input file is simple text like aaa bbb ccc aaa, then reduce class will get called with aaa - [1, 1], bbb -[1] and ccc - [1] as input. Hadoop framework takes care of collecting output of Mapper and then converting it into key -[value,value] format. In the reducer only thing that i had to do was to iterate through all the values and come up with a count. Once i have that i write it as output of Reducer by calling context.write(key, new IntWritable(sum));
  3. The last part is creating WordCountDriver.java, which is a Java program that sets up Hadoop Framework by setting up inputs, defining outputs and also specifying name of the Mapper and Reducer class. After initializing Hadoop it calls job.waitForCompletion(true), this method will take care of passing the control to Hadoop framework and wait for the job to complete
  4. Now you can either use one of the existing .txt file on your machine or you can create a text file like this
    
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa
    XXX YYY ZZZ
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    XXX YYY ZZZ
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    XXX YYY ZZZ
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa 
    hhh eee  iii bbb ccc fff ddd  ggg aaa aaa
    
  5. Last step is to run your Hadoop program, if you used the Eclipse or some other IDE for developing your code, you can run your program directly by running WordCountDriver.java directly. This program takes 2 parameters, in my case since the input file is on local file system and i want the output to get stored on local file system too, i pass following 2 parameters
    
    file:///Users/sunil/hadoop/sorttest.txt file:///Users/sunil/hadoop/output/wordcount
    
  6. Once the program is finished successfully, you would be able to see part-r-00000 file created on your local machine at /Users/sunil/hadoop/output/wordcount, if you open it you should see output like this
    
    XXX     3
    YYY     3
    ZZZ     3
    aaa     10
    bbb     5
    ccc     5
    ddd     5
    eee     5
    fff     5
    ggg     5
    hhh     5
    iii     5
    
If you want to run this program with bigger text file then you can download few good classical books from Algorithm site data section

39 comments:

Unknown said...

Really it was a good example .helped me a lot as a beginer .

Unknown said...

There are lots of information about latest technology and how to get trained in them, like Best Hadoop Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Best hadoop training institute in chennai). By the way you are running a great blog. Thanks for sharing this.

Big Data Course in Chennai | Big Data Training Chennai

Anonymous said...

Thanks Sunil,

Please explain where I need to execute Mapper and Reducer program??? Is there any type of utility where I will execute ???

Bay Max said...

Really great post.Thanks for sharing this blog.It helps me to get a good job.Keep sharing.
Regards

Hadoop Training chennai
| Hadoop Training in chennai

Arjun kumar said...

Excellent article. Hadoop is a cloud based tool.It give more information about massive storage and it helps to improve our skills. Hadoop provides more job opportunities.To achieve a great career join with us.
Regards,
Arjun
Hadoop Training Chennai|Big Data Training Chennai

Aurthur said...

Big data is the next big thing in the information technology space. According to a recent survey there is a huge demand for professional big data analysts who are capable of processing large data so that the enterprise objective are met. Join Fita and get trained from the professional big data analysts who are working for corporates. Join FITA and stay ahead in your career.
Thanks,
Hadoop Training in Chennai | Bigdata Training in Chennai

Unknown said...

Thanks for giving Good Example. Fantastic article, Viral. Very well written, clear and concise. One of the best links explaining one to many and hierarchy in Hadoop.
Big data Hadoop Training

Shanayashrma said...

Great!! This is such an informative content. It will help for Beginner. Keep it up.

Best Tally Developer Training in Delhi
Best Tally ERP 9 Training in Delhi

Unknown said...

In order to write MapReduce applications you need to have an understanding of how data is transformed as it executes in the MapReduce framework.
Java Certification Training in Chennai

kimjhon said...

Learned a lot of new things from your post! Good creation and HATS OFF to the creativity of your mind. Very interesting and useful blog!
Java Training in Chennai
Java Course in Chennai
Best Java Training in Chennai

Dharani said...

thanks for sharing
Best Linux Training Institute in Chennai

Unknown said...

good explaination about hadoop and map reduce ,
i found more resources where you can find tested source code of map reduce programs
refere this

top 10 map reduce program sources code

top 10 Read Write fs program using java api

top 30 hadoop shell commands

Unknown said...

Hi ,Your post on MapReduce Program was easy to understand I used the codes with little modifications,thanks for your post Hadoop Training in Velachery | Hadoop Training .

gracylayla said...

Hello,
Great post! I am see the programming coding and step by step execute the outputs.I am gather this coding more information. It's helpful for me. Also great blog here with all of the valuable information you have. Know More Info About The Best MapReduce Certification.

Unknown said...

What a fantastic read on Hadoop. This has helped me understand a lot in Hadoop course. Please keep sharing similar write ups on Hadoop. Guys if you are keen to knw more on Hadoop, must check this wonderful Hadoop tutorial and i'm sure you will enjoy learning on Hadoop training.https://www.youtube.com/watch?v=1jMR4cHBwZE

Kamal said...

Much obliged for sharing such an awesome information.Its extremely pleasant and useful.
Education | Article Submission sites | Technology

priya rajesh said...

Great and useful blog admin, I would like to read more. Your step by step coding is really understandable. Continue sharing more like this.
Hadoop Training Chennai | Big Data Training Chennai | Best Hadoop Training in Chennai

IT Tutorials said...

Thanks for your article. Its very helpful. Hadoop training in chennai | Hadoop Training institute in chennai

Tejuteju said...

very informative blog and useful article thank you for sharing with us, keep posting Big data hadoop online Course

Unknown said...

Thanks for sharing the valuable information, keep sharing.
Regards,
Hadoop Training Chennai|Big Data Training in Chennai

nancy said...

Very clear explanation. Please share more like that..


RPA Training in Hyderabad

Unknown said...

Very Impressive Big Data Hadoop tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Big Data Hadoop course. I'm also a learner taken up Big Data Hadoop Tutorial and I think your content has cleared some concepts of mine. While browsing for Hadoop tutorials on YouTube i found this fantastic video on Big Data Hadoop Tutorial.Do check it out if you are interested to know more.https://www.youtube.com/watch?v=nuPp-TiEeeQ&

tally course in delhi said...

The great article..thanks for sharing nice program information..
Tally Course in Delhi...

Japanese classes said...

I just like the helpful information you provide in your articles. I will bookmark your blog and take a look at once more here regularly.
I am somewhat certain I’ll be informed plenty of new stuff right here! Good luck for the following!

localebazar said...

hey...It is highly comprehensive and elaborated. Thanks for sharing!

Localebazar- Your single guide for exploring delicious foods, travel diaries and fitness stories.

Visit us for more- localebazar.com

aspire world immigration said...

Nice post, I would like to see more articles/blogs. I am also a content writer and writing a blog you can review it. immigration consultants in Delhi

deiva said...

Excellent article. Hadoop is a cloud based tool.It give more information about massive storage and it helps to improve our skills. Hadoop provides more job opportunities.To achieve a great career join with us.
Regards,
java training in chennai

java training in omr

aws training in chennai

aws training in omr

python training in chennai

python training in omr

selenium training in chennai

selenium training in omr

Jayalakshmi said...

The Information which you provided is very much useful. Great post with unique information.
data science training in chennai

data science training in tambaram

android training in chennai

android training in tambaram

devops training in chennai

devops training in tambaram

artificial intelligence training in chennai

artificial intelligence training in tambaram

shiny said...

This is my first visit to your blog, your post made productive reading, thank you

data science training in chennai

data science training in annanagar

android training in chennai

android training in annanagar

devops training in chennai

devops training in annanagar

artificial intelligence training in chennai

artificial intelligence training in annanagar

jeni said...

Its such as you learn my mind! You appeаr tо grasp ѕo much approximately this, such as you wrote the book in it or something.
I think that you could ɗo wіth some percent to pressure the mesѕage home a little bit,
but instead of that, this iѕ excellent blog. An excellent
read. I ԝilⅼ defіnitely be back.
sap training in chennai

sap training in velachery

azure training in chennai

azure training in velachery

cyber security course in chennai

cyber security course in velachery

ethical hacking course in chennai

ethical hacking course in velachery

lavanya said...

This is my first visit to your blog, your post made productive reading, thank yousalesforce training in chennai

software testing training in chennai

robotic process automation rpa training in chennai

blockchain training in chennai

devops training in chennai

lavanya said...

This is my first visit to your blog, your post made productive reading, thank yousalesforce training in chennai

software testing training in chennai

robotic process automation rpa training in chennai

blockchain training in chennai

devops training in chennai

Prwatech said...

Wow! Such an amazing and helpful post this is. I really really love it. I hope that you continue to do your work like this in the future also.

Hadoop Training Institute in Pune
Hadoop Administration training institutes in Pune

clasesofproessioanl said...
This comment has been removed by the author.
Technogeekscs said...

I really happy found this website eventually. Really informative and inoperative, Thanks for the post and effort! Please keep sharing more such blog.

DevOps Course in Pune

UP Election said...

Fine way of telling, and pleasant post. Nice info! Thanks a lot for sharing it, that’s truly has added a lot to our knowledge about this topic. Have a more successful day. Amazing write-up, always find something interesting.
Thanks

Ahana said...

Nice! your blog contained very useful information for us! You explained the map reduce method very well with programmes. best python training course in delhi

VISWA Technologies said...

I am so grateful for your article.
ETL Testing Online Training
Microservices Online Training
<a href="https://viswaonlinetrainings.com/courses/oracle-sql-and-plsql-training/>Oracle SQL&PLSQL Online Training</a>

Arun said...

Great job! Your blog provided incredibly valuable insights for us! The way you elucidated the map reduction technique alongside practical examples was truly enlightening.
Vist: https://www.thinkcyberindia.com/