Using Apache Oozie for automating streaming map-reduce job

In the WordCount MapReduce program using Hadoop streaming and python i talked about how to create a Streaming map-reduce job using python. I wanted to figure out how to automate that program using Oozie workflow so i followed these steps
  1. First step was to create a folder called streaming on my local machine and copying of mapper.py, reducer.py into the streaming folder, i also create the place holder for job.properties and workflow.xml
  2. Next i did create a job.properties file like this Now this job.properties is quite similar to the job.properties for java mapreduce job, only difference is you must set oozie.use.system.libpath=true, by default the streaming related jars are not included in the classpath, so unless you set that value to true you will get following error
    
    2014-07-23 06:15:13,170 WARN org.apache.hadoop.mapred.Child: Error running child
    java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.Pi
    peMapRunner not found
     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1649)
     at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:1010)
     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
    Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not f
    ound
     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1617)
     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1641)
     ... 8 more
    Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1523)
     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1615)
     ... 9 more
    2014-07-23 06:15:13,175 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
    
  3. Next step in the process is to create workflow.xml file like this, make sure to add <file>mapper.py#mapper.py</file> element in the workflow.xml, which takes care of putting the mapper.py and reducer.py in the sharedlib and creating symbolic link to these two files.
  4. Upload the streaming folder with all your changes on hdfs by executing following command
    
    hdfs dfs -put streaming streaming
    
  5. You can trigger the oozie workflow by executing following command
    
    oozie job -oozie http://localhost:11000/oozie -config streaming/job.properties -run
    

13 comments:

gowsalya said...

The knowledge of technology you have been sharing thorough this post is very much helpful to develop new idea. here by i also want to share this.

Devops Training in pune|Devops training in tambaram|Devops training in velachery|Devops training in annanagar
DevOps online Training

Mounika said...

This is beyond doubt a blog significant to follow. You’ve dig up a great deal to say about this topic, and so much awareness. I believe that you recognize how to construct people pay attention to what you have to pronounce, particularly with a concern that’s so vital. I am pleased to suggest this blog.
python training in chennai
python training in chennai
python training in Bangalore

Unknown said...

Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.

java training in chennai | java training in bangalore

java online training | java training in pune

nivatha said...

Really you have done great job,There are may person searching about that now they will find enough resources by your post
Data Science training in kalyan nagar | Data Science training in OMR
Data Science training in chennai | Data science training in velachery
Data science training in jaya nagar

dwarakesh said...

Whoa! I’m enjoying the template/theme of this website. It’s simple, yet effective. A lot of times it’s very hard to get that “perfect balance” between superb usability and visual appeal. I must say you’ve done a very good job with this.


AWS Training in Velachery | Best AWS Course in Velachery,Chennai

Best AWS Training in Chennai | AWS Training Institutes |Chennai,Velachery

Amazon Web Services Training in Anna Nagar, Chennai |Best AWS Training in Anna Nagar, Chennai

Amazon Web Services Training in OMR , Chennai | Best AWS Training in OMR,Chennai


Amazon Web Services Training in Tambaram, Chennai|Best AWS Training in Tambaram, Chennai


AWS Training in Chennai | AWS Training Institute in Chennai Velachery, Tambaram, OMR

Anoushka Sakthi said...

Its my great pleasure to be here on your article!! for sure ill be back to read the next blog of yours.

Selenium Training in Chennai
Best selenium training in chennai
iOS Training in Chennai
.Net coaching centre in chennai
French Classes in Chennai
Big Data Training in Chennai
best cloud computing training in chennai
cloud computing certification

Vicky Ram said...

Wonderful piece of work. Master stroke. I have become a fan of your words. Pls keep on writing.

Guest posting sites
Education

nash b said...

Excellent Post...
final year project proposal for information technology

free internship for bca

web designing training in chennai

internship in coimbatore for ece

machine learning internship in chennai

6 months training with stipend in chennai

final year project for it

inplant training in chennai for ece students

industrial training report for electronics and communication

inplant training certificate

nash b said...

Keep Share..
snowflake interview questions and answers

inline view in sql server

a watch was sold at loss of 10

resume format for fresher lecturer in engineering college doc

qdxm:sfyn::uioz:

java developer resume 6 years experience

please explain in brief why you consider yourself suitable for the position applied for

windows 10 french iso kickass

max int javascript

tp link router password hack

Prwatech said...

Wow! Such an amazing and helpful post this is. I really really love it. I hope that you continue to do your work like this in the future also.

Best python classes in Pune
Python Classes in Pune

INFYCLE TECHNOLOGIES said...

Reach to the best Data Science Training institute in Chennai for skyrocketing your career, Infycle Technologies. It is the best Software Training & Placement institute in and around Chennai, that also gives the best placement training for personality tests, interview preparation, and mock interviews for leveling up the candidate's grades to a professional level.

Maridev said...

Infycle Technologies, the top software training institute and placement center in Chennai offers the Digital Marketing course in Chennai for freshers, students, and tech professionals at the best offers. In addition to the Oracle training, other in-demand courses such as DevOps, Data Science, Python, Selenium, Big Data, Java, Power BI, Oracle will also be trained with 100% practical classes. After the completion of training, the trainees will be sent for placement interviews in the top MNC's. Call 7504633633 to get more info and a free demo.

Extren Labs said...

This information is very useful and attractive. For those who need this information, it's very informative and understandable for those all. In Extern Labs, we have professional Web designers for intuitive website designs. Extern Labs is also a website design company.You can hire a dedicated Web designer to design an interactive website with innovative ideas. Thanks for this information. website design company