Sunil's Notes: WordCount program writtten using Spark framework written in python language

In the WordCount(HelloWorld) MapReduce program entry i talked about how to build a simple WordCount program using MapReduce. I wanted to try developing same program using Apache Spark but using Python, so i followed these steps

Download version of spark that is appropriate for your hadoop from Spark Download page. In my case i am using Cloudera CHD4 VM image for development so i did download CDH4 version
I did extract the spark-1.0.0-bin-cdh4.tgz in /home/cloudera/software folder
Next step is to build a WordCount.py program like this. This program has 3 methods in this
- flatMap: This method takes a line as input and splits it on space and publishes those words
- map: This method takes a word as input and publishesh a tuple in word, 1 format
- reduce: This method takes care of adding all the counters together
The counts = distFile.flatMap(flatMap).map(map).reduceByKey(reduce) takes care of tying everything together

Once WordCount.py is ready you can execute it like this by providing it path of the WordCount.py and input and output path


 ./bin/spark-submit --master local[4] /home/cloudera/workspace/spark/HelloSpark/WordCount.py 
file:///home/cloudera/sorttext.txt file:///home/cloudera/output/wordcount

Once the program is done executing you can take a look at the output by executing following command
```
more /home/cloudera/output/wordcount/part-00000
```

6 comments:

Unknown said...: Hi There,

When I use your example without the code specifying the output file, the output can be printed into terminal. But when I added the output address, there is no output, and terminal has a response: "Usage: wordcount ".

Can you help me with this?; December 12, 2017 at 4:43 PM
martin said...: This article is so useful for users. Thanks for sharing this news with us !
Word Count Software; June 13, 2019 at 1:27 AM
interior designers in bangalore said...: Hey, Great article! I liked the way you write, Check my articles . You may like itInterior Renovation Ideas on your Budget: 5 MINIMALIST INTERIOR DESIGN IDEAS 11 Ultimate tips for Kitchen Interior Designing Useful ideas for Apartment home Interior designs:; July 31, 2019 at 10:17 PM
Anonymous said...: I wish more authors of this type of content would take the time you did to research and write so well. I am very impressed with your vision and insight. this; July 10, 2020 at 11:44 AM
Liam Santos said...: Thank you ffor this; January 3, 2022 at 10:39 PM
iim skills said...: "I’m so impressed with the idea of creating personalized Frozen cups! It's a brilliant way to make any Frozen-themed party or event extra special. The detailed instructions and pictures make it simple and fun to follow along. This craft is sure to be a hit with kids and parents alike. Thank you for sharing this fantastic idea!"
Medical Coding Courses in Kochi; March 4, 2025 at 9:01 AM