WordCount program writtten using Spark framework written in python language

In the WordCount(HelloWorld) MapReduce program entry i talked about how to build a simple WordCount program using MapReduce. I wanted to try developing same program using Apache Spark but using Python, so i followed these steps
  1. Download version of spark that is appropriate for your hadoop from Spark Download page. In my case i am using Cloudera CHD4 VM image for development so i did download CDH4 version
  2. I did extract the spark-1.0.0-bin-cdh4.tgz in /home/cloudera/software folder
  3. Next step is to build a WordCount.py program like this. This program has 3 methods in this
    • flatMap: This method takes a line as input and splits it on space and publishes those words
    • map: This method takes a word as input and publishesh a tuple in word, 1 format
    • reduce: This method takes care of adding all the counters together
    The counts = distFile.flatMap(flatMap).map(map).reduceByKey(reduce) takes care of tying everything together
  4. Once WordCount.py is ready you can execute it like this by providing it path of the WordCount.py and input and output path
    
     ./bin/spark-submit --master local[4] /home/cloudera/workspace/spark/HelloSpark/WordCount.py 
    file:///home/cloudera/sorttext.txt file:///home/cloudera/output/wordcount
    
  5. Once the program is done executing you can take a look at the output by executing following command
    
    more /home/cloudera/output/wordcount/part-00000
    

4 comments:

Henry Zhang said...

Hi There,

When I use your example without the code specifying the output file, the output can be printed into terminal. But when I added the output address, there is no output, and terminal has a response: "Usage: wordcount ".

Can you help me with this?

srjwebsolutions said...

We are leading responsive website designing and development company in Noida.
We are offering mobile friendly responsive website designing, website development, e-commerce website, seo service and sem services in Noida.

Responsive Website Designing Company in Noida
Website Designing Company in Noida
SEO Services in Noida
SMO Services in Noida

Vikas Chaudhary said...

Battery Mantra is Authorized exide car battery dealer in Noida and Greater Noida. We are providing our service in Indirapuram, Delhi, Ashok Nagar.

Exide Battery Dealer in Noida
Battery Dealer in Noida
Authorized Battery Dealer in Noida
Car Battery Dealer in Noida
Car Battery Dealer
Exide Battery Dealer

EG MEDI said...

Egmedi.com is online medical store pharmacy in laxmi nagar Delhi. You can Order prescription/OTC medicines online. Cash on Delivery available. Free Home Delivery


Online Pharmacy in Delhi
Buy Online medicine in Delhi
Online Pharmacy in laxmi nagar
Buy Online medicine in laxmi nagar
Onine Medical Store in Delhi
Online Medical store in laxmi nagar
Online medicine store in delhi
online medicine store in laxmi nagar
Purchase Medicine Online
Online Pharmacy India
Online Medical Store