- First i followed instructions on pydoop installation page to install pydoop on my machine. I ran into some issues during that process but eventually had pydoop installed
-
Next i did create a HelloPydoop.py file which contains mapper function and reducer function like this. The mapper function gets linenumber and line at a time, in that function i am taking care of breaking the line into words and then writing them into output (
writer.emit()
). In the reducer method i am getting word and incount in the(key, [value,value1]
format. Which is different that Hadoop streaming where i have to take care of change in key, so this code is much compactThis file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersdef mapper(lineNumber, line, writer): for word in line.split(" "): writer.emit(word,"1") def reducer(word, incount, writer): writer.emit(word, sum(map(int, incount))) -
Once my HelloPydoop.py file is ready i could invoke it by passing to pydoop script in this aesop.txt is the name of the input file in HDFS and i want the output to get generated in output/pydoop directory in HDFS.
pydoop script /home/user/PycharmProjects/HelloWorld/Pydoop/HelloPydoop.py aesop.txt output/pydoop
-
After the map reducer is done executing i can look at its output by executing
hdfs dfs -cat output/pydoop/part-00000
command
WordCount MapReduce program using Pydoop (MapReduce framework for Python)
In WordCount MapReduce program using Hadoop streaming and python entry i used Hadoop Streaming for creating MapReduce program, but that program is quite verbose and it has limitations such as you cannot use counters,.. etc.
So i decided to develop same program using Pydoop, which is framework that makes is easy to developing Hadoop Map Reduce program as well as working with HDFS easier. I followed these steps
Subscribe to:
Post Comments (Atom)
2 comments:
Thanks for helping me to understand basic Hadoop Map reduce program on hadoop concepts. As a beginner in Hadoop your post help me a lot.
Hadoop Training in Velachery | Hadoop Training .
Hadoop Training in Chennai | Hadoop .
Thanks for info....
Website development in Bangalore
Post a Comment