Sunil's Notes: oozie

Showing posts with label oozie. Show all posts

Importing data from RDBMS into Hive using Sqoop and oozie (hive-import)

In the How to run Sqoop command from oozie entry i talked about how you can use Oozie and Sqoop to import data into HDFS. I wanted to change it to use sqoop's hive-import option, which in addition to importing data into HDFS also creats Hive table on top of the data. These are the steps that i followed

First i changed the workflow.xml to take out as-avrodatafile and added hive-import option and i re-ran the workflow that looks like this When i did that the oozie workflow failed with following error


7936 [uber-SubtaskRunner] WARN  org.apache.sqoop.mapreduce.JobBase  - SQOOP_HOME is unset. May not be able to find all job dependencies.
9202 [uber-SubtaskRunner] DEBUG org.apache.sqoop.mapreduce.db.DBConfiguration  - Fetching password from job credentials store
9207 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.db.DBInputFormat  - Using read commited transaction isolation
9210 [uber-SubtaskRunner] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat  - Creating input split with lower bound '1=1' and upper bound '1=1'
25643 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Transferred 931.1768 KB in 17.6994 seconds (52.6107 KB/sec)
25649 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Retrieved 12435 records.
25649 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport  - Hive.inputTable: customers
25650 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport  - Hive.outputTable: customers
25653 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Execute getColumnInfoRawQuery : SELECT t.* FROM `customers` AS t LIMIT 1
25653 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - No connection paramenters specified. Using regular API for making connection.
25658 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Using fetchSize for next query: -2147483648
25658 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_id of type [4, 11, 0]
25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_fname of type [12, 45, 0]
25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_lname of type [12, 45, 0]
25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_email of type [12, 45, 0]
25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_password of type [12, 45, 0]
25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_street of type [12, 255, 0]
25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_city of type [12, 45, 0]
25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_state of type [12, 45, 0]
25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_zipcode of type [12, 45, 0]
25663 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.TableDefWriter  - Create statement: CREATE TABLE IF NOT EXISTS `customers` ( `customer_id` INT, `customer_fname` STRING, `customer_lname` STRING, `customer_email` STRING, `customer_password` STRING, `customer_street` STRING, `customer_city` STRING, `customer_state` STRING, `customer_zipcode` STRING) COMMENT 'Imported by sqoop on 2016/12/22 21:18:39' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE
25664 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.TableDefWriter  - Load statement: LOAD DATA INPATH 'hdfs://quickstart.cloudera:8020/user/cloudera/customers' INTO TABLE `customers`
25667 [uber-SubtaskRunner] INFO  org.apache.sqoop.hive.HiveImport  - Loading uploaded data into Hive
25680 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport  - Using in-process Hive instance.
25683 [uber-SubtaskRunner] DEBUG org.apache.sqoop.util.SubprocessSecurityManager  - Installing subprocess security manager
Intercepting System.exit(1)

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000007-161222163830473-oozie-oozi-W/sqoop-52c0--sqoop/action-data.seq

Oozie Launcher ends

As you can see from the log the Sqoop job was able to import data into HDFS in /user/cloudera/customers directory and i could actually see the data in the directory. But when Sqoop tried to create the table in hive it failed and the table did not get created in hive, this is the log statement that i am referring to CREATE TABLE IF NOT EXISTS `customers` ( `customer_id` INT, `customer_fname` STRING, `customer_lname` STRING, `customer_email` STRING, `customer_password` STRING, `customer_street` STRING, `customer_city` STRING, `customer_state` STRING, `customer_zipcode` STRING) COMMENT 'Imported by sqoop on 2016/12/22 21:18:39' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE
So it seems the problem is Sqoop needs hive-site.xml so that it knows how to talk to hive service, for that first i search my sandbox to figure out where hive-site.xml is located, i executed following command to first find the hive-site.xml and then uploading it to HDFS sudo find / -name hive-site.xml hdfs dfs -put /etc/hive/conf.dist/hive-site.xml
After that i went back to the workflow.xml and modified it to look like this

Now when i ran the oozie workflow it was successful and i could query customer data

How to run Sqoop command from oozie

In the Importing data from Sqoop into Hive External Table with Avro encoding updated i blogged about how you can use sqoop to import data from RDBMS into Hadoop. I wanted to test if i can use Oozie for invoking Sqoop command and i followed these steps for doing that.

First i tried executing this command from my command line on Hadoop cluster to make sure that i can actually run sqoop without any problem
```
sqoop import --connect jdbc:mysql://localhost/test 
--username root 
--password cloudera 
--table CUSTOMER 
--as-avrodatafile
```
Once the sqoop command was successfully executed i went back and deleted the CUSTOMER directory from HDFS to make sure that i could re-import data using following command
```
hdfs dfs -rm -R CUSTOMER
```
Next i went to Hue to create oozie workflow with single sqoop command that i had executed before
But if your not using the Hue console you can create workflow.xml manually like this Also make sure to create job.properties file like this Take a look at Enabling Oozie console on Cloudera VM 4.4.0 and executing examples for information on how to run oozie job from command line

Next when i ran the Oozie workflow, the job failed with following error, which indicates that Oozie does not have the MySQL JDBC driver.


java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
 at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875)
 at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
 at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
 at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
 at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
 at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
 at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
 at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
 at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1846)
 at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1646)
 at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
 at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
 at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
 at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
 at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
 at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
 at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
 at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
 at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177)
 at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)

So first thing i did was to check if mysql driver is there in the oozie shared lib by executing following commands
```
export OOZIE_URL=http://localhost:11000/oozie
oozie admin -shareliblist sqoop
```
I noticed that the mysql-connector-java.jar was not there in the list of shared libs for Oozie + sqoop
Next step was to find the mysql-connector-java.jar in my sandbox that i could do by finding it like this
```
sudo find / -name mysql*
```
I found mysql-connector-java.jar on my local machine at /var/lib/sqoop/mysql-connector-java.jar
I wanted to update the Oozie shared lib to include the mysql driver jar. So i executed following command to figure out the directory where the oozie sqoop shared lib is
```
oozie admin -sharelibupdate
```
From this output i got HDFS directory location for Oozie shared lib which is /user/oozie/share/lib/lib_20160406022812
Then i used following two commands to first copy the db driver into the oozie shared lib and making sure it is accessible to other users hdfs -copyFromLocal /var/lib/sqoop/mysql-connector-java.jar /user/oozie/share/lib/sqoop/. hdfs dfs -chmod 777 /user/oozie/share/lib/sqoop/mysql-connector-java.jar
Now the last step was to let Oozie know that it should reload the sharedlib and i did that by executing following two commands
```
oozie admin -sharedlibupdate
oozie admin -shareliblist sqoop | grep mysql*
```
The second command queries oozie to get current list of shared jars and i could see mysql-connector-java.jar listed in it like this

When i re-executed the ooize job again this time it ran successfully.

Running oozie job on Hortonworks Sandbox

In the Enabling Oozie console on Cloudera VM 4.4.0 and executing examples i blogged about how to run oozie job in Cloudera Sandbox. It seems this process is little bit easier in HortonWorks 2.2 sandbox. So first i had brand new HDP 2.2 image and i tried running oozie example on it by executing


oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run

But when i tried running it i got following error


Error: E0501 : E0501: Could not perform authorization operation, Call From sandbox.hortonworks.com/10.0.2.15 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

So i looked into /var/log/oozie/oozie.log and i saw following error


2015-05-01 20:34:39,195  WARN V1JobsServlet:546 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] URL[POST http://sandbox.hortonworks.com:11000/oozie/v2/jobs?action=start] error[E0501], E0501: Could not perform authorization operation, Call From sandbox.hortonworks.com/10.0.2.15 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
org.apache.oozie.servlet.XServletException: E0501: Could not perform authorization operation, Call From sandbox.hortonworks.com/10.0.2.15 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
 at org.apache.oozie.servlet.BaseJobServlet.checkAuthorizationForApp(BaseJobServlet.java:240)
 at org.apache.oozie.servlet.BaseJobsServlet.doPost(BaseJobsServlet.java:96)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
 at org.apache.oozie.servlet.JsonRestServlet.service(JsonRestServlet.java:287)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:143)
 at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
 at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
 at org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:148)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:84)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
 at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.oozie.service.AuthorizationException: E0501: Could not perform authorization operation, Call From sandbox.hortonworks.com/10.0.2.15 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
 at org.apache.oozie.service.AuthorizationService.authorizeForApp(AuthorizationService.java:399)
 at org.apache.oozie.servlet.BaseJobServlet.checkAuthorizationForApp(BaseJobServlet.java:229)
 ... 25 more
Caused by: java.net.ConnectException: Call From sandbox.hortonworks.com/10.0.2.15 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
 at org.apache.hadoop.ipc.Client.call(Client.java:1472)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy29.getFileInfo(Unknown Source)
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy30.getFileInfo(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
 at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
 at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
 at org.apache.oozie.service.AuthorizationService.authorizeForApp(AuthorizationService.java:371)
 ... 26 more
Caused by: java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
 at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
 at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
 at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
 at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
 at org.apache.hadoop.ipc.Client.call(Client.java:1438)
 ... 44 more

In order to solve these issues i had to make changes in examples/apps/map-reduce/job.properties, to replace localhost with sandbox.hortonworks.com



nameNode=hdfs://sandbox.hortonworks.com:8020
jobTracker=sandbox.hortonworks.com:8032

queueName=default
examplesRoot=examples

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce

Using Apache Oozie for automating streaming map-reduce job

In the WordCount MapReduce program using Hadoop streaming and python i talked about how to create a Streaming map-reduce job using python. I wanted to figure out how to automate that program using Oozie workflow so i followed these steps

First step was to create a folder called streaming on my local machine and copying of mapper.py, reducer.py into the streaming folder, i also create the place holder for job.properties and workflow.xml

Next i did create a job.properties file like this Now this job.properties is quite similar to the job.properties for java mapreduce job, only difference is you must set oozie.use.system.libpath=true, by default the streaming related jars are not included in the classpath, so unless you set that value to true you will get following error


2014-07-23 06:15:13,170 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.Pi
peMapRunner not found
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1649)
 at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:1010)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not f
ound
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1617)
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1641)
 ... 8 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
 at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1523)
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1615)
 ... 9 more
2014-07-23 06:15:13,175 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

Next step in the process is to create workflow.xml file like this, make sure to add <file>mapper.py#mapper.py</file> element in the workflow.xml, which takes care of putting the mapper.py and reducer.py in the sharedlib and creating symbolic link to these two files.
Upload the streaming folder with all your changes on hdfs by executing following command
```
hdfs dfs -put streaming streaming
```

You can trigger the oozie workflow by executing following command


oozie job -oozie http://localhost:11000/oozie -config streaming/job.properties -run

Using Apache Oozie to execute MapReduce jobs

I wanted to learn about how to automate MapReduce job using Oozie, so i decide to create Oozie workflow to invoke WordCount(HelloWorld) MapReduce program. I had to follow these steps

FIrst thing that i did was to download the WordCount program source code by executing
```
git clone https://github.com/sdpatil/HadoopWordCount3
```
This program does have maven script for building executable jar, so i used mvn clean package command to build Hadoop jar.
After that i tried executing the program manually by using following following command
```
hadoop jar target/HadoopWordCount.jar sorttest.txt output/wordcount
```
Now in order to use Oozie workflow you will have to create a particular folder structure on your machine
```
wordcount
   -- job.properties
   -- workflow.xml
   -- lib
         -- HadoopWordCount.jar  
```
In the workcount folder create job.properties file like this, This file lets you pass parameters to your oozie workflow. Value of nameNode and jobTracker represent the name node and job tracker location. In my case i am using cloudera vm with single ndoe so both these properties point to localhost. The value of oozie.wf.application.path is equal to HDFS path where you uploaded the wordcount folder created in step 3
Next define your Apache oozie workflow.xml file like this. In my case the workflow has single step which is to execute mapreduce job. I am
- mapred.mapper.new-api & mapred.reducer.new-api: Set this property to true if your using the new MapReduce API based on org.apache.hadoop.mapreduce.* classes
- mapreduce.map.class: The fully qualified name of your mapper class
- mapreduce.reduce.class: The fully qualified name of your reducer class
- mapred.output.key.class: Fully qualified name of the output key class. This is same as parameter to job.setOutputKeyClass() in your driver class
- mapred.output.value.class: Fully qualified name of the output value class. This is same as parameter to job.setOutputValueClass() in your driver class
- mapred.input.dir: Location of your input file in my case i have sorttext.txt in hdfs://localhost/user/cloudera directory
- mapred.output.dir:Location of output file that will get generated. In my case i want output to go to hdfs://localhost/user/cloudera/output/wordcount directory
Once your oozie workflow is ready upload the wordcount folder in HDFS by executing following command
```
hdfs dfs -put oozie wordcount
```


Now run your oozie workflow by executing following command from your wordcount directory
oozie job -oozie http://localhost:11000/oozie -config job.properties -run

If it runs successfully you should see output generated in hdfs://localhost/user/cloudera/output/wordcount directory

Enabling Oozie console on Cloudera VM 4.4.0 and executing examples

I am trying to learn about Apache Oozie, so i wanted to figure out how to use it in Cloudera 4.4.0 VM. When you go to the Oozie web console it shows a message saying that the Console is disabled. In order to enable the console i had to follow these steps

Go to your Cloudera Manager, in that i went to the oozie configuration screen and i did check the Enable Oozie Server Web Console screen like this. As you can see in the description it says install ExtJS2.2 in /usr/lib/oozie/libext
Next i did go to /usr/lib/oozie/libext directory and executed following command to download the ext-2.2.zip.
```
wget 'http://extjs.com/deploy/ext-2.2.zip'
```
Since i am using CDH 4.4 i had to execute unzip ext-2.2.zip to unzip the ext-2.2.zip
Last step was to restart the oozie service and now i could see the Oozie web console

Executing oozie examples After the Oozie console was enabled i wanted to execute oozie example to test out my installation so i followed these steps

First thing for me was to find the oozie-examples.tar.gz file on my vm
```
find / -name oozie-examples.tar.gz
```
I found it under /usr/share/doc/oozie-3.3.2+92/ directory. So i did untar it using tar xvf oozie-examples.tar.gz

Then i had to make change in the job.properties to change value of namenode and jobTracker from localhost to localhost.localdomain get rid of Error: E0901 : E0901: Namenode [localhost:8020] not allowed, not in Oozies whitelist error.


nameNode=hdfs://localhost.localdomain:8020
jobTracker=localhost.localdomain:8021
queueName=default
examplesRoot=examples

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce

After making changes in job.properties i did upload the examples folder to HDFS using following command
```
hdfs dfs -put examples examples
```
The last step in the process was to actually run the mapreduce job in oozie by executing following command
```
oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
```
Once the job was started i could see the progress using Oozie web console like this