- First i did create apachelog directory and in that i had to create job.properties file like this
-
Then i create workflow.xml file that looks like this, in this one thing to notice is
<file>GeoLite.mmdb#GeoLite2-City.mmdb</file>
, so basically i have file GeoLite.mmdb on the disk but i want to refer to it asGeoLite2-City.mmdb
in my program so that file element takes care of creating symlink - Then i copied all the required jar files in the lib folder and then this is how my directory structure looks like
- I used following command to copy the apachelog directory that has everything that my oozie job needs to the hdfs
hdfs dfs -put apachelog apachelog
- Last step is to invoke the oozie job by executing following command
oozie job -oozie http://localhost:11000/oozie -config job.properties -run
Creating Oozie workflow for mapreduce job that uses distributed cache
In the Using third part jars and files in your MapReduce application(Distributed cache) entry i blogged about how to create a MapReduce job that uses distributed cache for storing both required jar files and files for use in distributed cache. I wanted to figure out how to automate this mapreduce job using Apache Oozie so i followed these steps
Subscribe to:
Post Comments (Atom)
2 comments:
where is the location for the ditributed cache file. I mean should it be hdfs..? can u plz help
Thanks for info....
Website development in Bangalore
Post a Comment