-
First i changed the workflow.xml to take out
as-avrodatafile
and addedhive-import
option and i re-ran the workflow that looks like this When i did that the oozie workflow failed with following error7936 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies. 9202 [uber-SubtaskRunner] DEBUG org.apache.sqoop.mapreduce.db.DBConfiguration - Fetching password from job credentials store 9207 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation 9210 [uber-SubtaskRunner] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '1=1' and upper bound '1=1' 25643 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 931.1768 KB in 17.6994 seconds (52.6107 KB/sec) 25649 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 12435 records. 25649 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport - Hive.inputTable: customers 25650 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport - Hive.outputTable: customers 25653 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Execute getColumnInfoRawQuery : SELECT t.* FROM `customers` AS t LIMIT 1 25653 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - No connection paramenters specified. Using regular API for making connection. 25658 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Using fetchSize for next query: -2147483648 25658 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1 25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_id of type [4, 11, 0] 25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_fname of type [12, 45, 0] 25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_lname of type [12, 45, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_email of type [12, 45, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_password of type [12, 45, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_street of type [12, 255, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_city of type [12, 45, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_state of type [12, 45, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_zipcode of type [12, 45, 0] 25663 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.TableDefWriter - Create statement: CREATE TABLE IF NOT EXISTS `customers` ( `customer_id` INT, `customer_fname` STRING, `customer_lname` STRING, `customer_email` STRING, `customer_password` STRING, `customer_street` STRING, `customer_city` STRING, `customer_state` STRING, `customer_zipcode` STRING) COMMENT 'Imported by sqoop on 2016/12/22 21:18:39' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE 25664 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.TableDefWriter - Load statement: LOAD DATA INPATH 'hdfs://quickstart.cloudera:8020/user/cloudera/customers' INTO TABLE `customers` 25667 [uber-SubtaskRunner] INFO org.apache.sqoop.hive.HiveImport - Loading uploaded data into Hive 25680 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport - Using in-process Hive instance. 25683 [uber-SubtaskRunner] DEBUG org.apache.sqoop.util.SubprocessSecurityManager - Installing subprocess security manager Intercepting System.exit(1) <<< Invocation of Main class completed <<< Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1] Oozie Launcher failed, finishing Hadoop job gracefully Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000007-161222163830473-oozie-oozi-W/sqoop-52c0--sqoop/action-data.seq Oozie Launcher ends
-
As you can see from the log the Sqoop job was able to import data into HDFS in
/user/cloudera/customers
directory and i could actually see the data in the directory. But when Sqoop tried to create the table in hive it failed and the table did not get created in hive, this is the log statement that i am referring toCREATE TABLE IF NOT EXISTS `customers` ( `customer_id` INT, `customer_fname` STRING, `customer_lname` STRING, `customer_email` STRING, `customer_password` STRING, `customer_street` STRING, `customer_city` STRING, `customer_state` STRING, `customer_zipcode` STRING) COMMENT 'Imported by sqoop on 2016/12/22 21:18:39' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE
- So it seems the problem is Sqoop needs hive-site.xml so that it knows how to talk to hive service, for that first i search my sandbox to figure out where hive-site.xml is located, i executed following command to first find the hive-site.xml and then uploading it to HDFS
sudo find / -name hive-site.xml hdfs dfs -put /etc/hive/conf.dist/hive-site.xml
- After that i went back to the workflow.xml and modified it to look like this
Showing posts with label oozie. Show all posts
Showing posts with label oozie. Show all posts
Importing data from RDBMS into Hive using Sqoop and oozie (hive-import)
In the How to run Sqoop command from oozie entry i talked about how you can use Oozie and Sqoop to import data into HDFS. I wanted to change it to use sqoop's hive-import option, which in addition to importing data into HDFS also creats Hive table on top of the data. These are the steps that i followed
How to run Sqoop command from oozie
In the Importing data from Sqoop into Hive External Table with Avro encoding updated i blogged about how you can use sqoop to import data from RDBMS into Hadoop. I wanted to test if i can use Oozie for invoking Sqoop command and i followed these steps for doing that.
-
First i tried executing this command from my command line on Hadoop cluster to make sure that i can actually run sqoop without any problem
sqoop import --connect jdbc:mysql://localhost/test --username root --password cloudera --table CUSTOMER --as-avrodatafile
-
Once the sqoop command was successfully executed i went back and deleted the CUSTOMER directory from HDFS to make sure that i could re-import data using following command
hdfs dfs -rm -R CUSTOMER
- Next i went to Hue to create oozie workflow with single sqoop command that i had executed before But if your not using the Hue console you can create workflow.xml manually like this Also make sure to create job.properties file like this Take a look at Enabling Oozie console on Cloudera VM 4.4.0 and executing examples for information on how to run oozie job from command line
- Next when i ran the Oozie workflow, the job failed with following error, which indicates that Oozie does not have the MySQL JDBC driver.
java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1846) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1646) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197) at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
- So first thing i did was to check if mysql driver is there in the oozie shared lib by executing following commands
I noticed that the mysql-connector-java.jar was not there in the list of shared libs for Oozie + sqoopexport OOZIE_URL=http://localhost:11000/oozie oozie admin -shareliblist sqoop
- Next step was to find the mysql-connector-java.jar in my sandbox that i could do by finding it like this
I found mysql-connector-java.jar on my local machine atsudo find / -name mysql*
/var/lib/sqoop/mysql-connector-java.jar
- I wanted to update the Oozie shared lib to include the mysql driver jar. So i executed following command to figure out the directory where the oozie sqoop shared lib is
From this output i got HDFS directory location for Oozie shared lib which isoozie admin -sharelibupdate
/user/oozie/share/lib/lib_20160406022812
- Then i used following two commands to first copy the db driver into the oozie shared lib and making sure it is accessible to other users
hdfs -copyFromLocal /var/lib/sqoop/mysql-connector-java.jar /user/oozie/share/lib/sqoop/. hdfs dfs -chmod 777 /user/oozie/share/lib/sqoop/mysql-connector-java.jar
- Now the last step was to let Oozie know that it should reload the sharedlib and i did that by executing following two commands
The second command queries oozie to get current list of shared jars and i could see mysql-connector-java.jar listed in it like thisoozie admin -sharedlibupdate oozie admin -shareliblist sqoop | grep mysql*
Running oozie job on Hortonworks Sandbox
In the Enabling Oozie console on Cloudera VM 4.4.0 and executing examples i blogged about how to run oozie job in Cloudera Sandbox. It seems this process is little bit easier in HortonWorks 2.2 sandbox.
So first i had brand new HDP 2.2 image and i tried running oozie example on it by executing
oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
But when i tried running it i got following error
Error: E0501 : E0501: Could not perform authorization operation, Call From sandbox.hortonworks.com/10.0.2.15 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
So i looked into /var/log/oozie/oozie.log and i saw following error
2015-05-01 20:34:39,195 WARN V1JobsServlet:546 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] URL[POST http://sandbox.hortonworks.com:11000/oozie/v2/jobs?action=start] error[E0501], E0501: Could not perform authorization operation, Call From sandbox.hortonworks.com/10.0.2.15 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
org.apache.oozie.servlet.XServletException: E0501: Could not perform authorization operation, Call From sandbox.hortonworks.com/10.0.2.15 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.oozie.servlet.BaseJobServlet.checkAuthorizationForApp(BaseJobServlet.java:240)
at org.apache.oozie.servlet.BaseJobsServlet.doPost(BaseJobsServlet.java:96)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.oozie.servlet.JsonRestServlet.service(JsonRestServlet.java:287)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:143)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
at org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:148)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:84)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.oozie.service.AuthorizationException: E0501: Could not perform authorization operation, Call From sandbox.hortonworks.com/10.0.2.15 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.oozie.service.AuthorizationService.authorizeForApp(AuthorizationService.java:399)
at org.apache.oozie.servlet.BaseJobServlet.checkAuthorizationForApp(BaseJobServlet.java:229)
... 25 more
Caused by: java.net.ConnectException: Call From sandbox.hortonworks.com/10.0.2.15 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy29.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy30.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
at org.apache.oozie.service.AuthorizationService.authorizeForApp(AuthorizationService.java:371)
... 26 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
... 44 more
In order to solve these issues i had to make changes in examples/apps/map-reduce/job.properties, to replace localhost with sandbox.hortonworks.com
nameNode=hdfs://sandbox.hortonworks.com:8020
jobTracker=sandbox.hortonworks.com:8032
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce
Using Apache Oozie for automating streaming map-reduce job
In the
WordCount MapReduce program using Hadoop streaming and python i talked about how to create a Streaming map-reduce job using python. I wanted to figure out how to automate that program using Oozie workflow so i followed these steps
- First step was to create a folder called streaming on my local machine and copying of mapper.py, reducer.py into the streaming folder, i also create the place holder for job.properties and workflow.xml
- Next i did create a job.properties file like this
Now this job.properties is quite similar to the job.properties for java mapreduce job, only difference is you must set
oozie.use.system.libpath=true
, by default the streaming related jars are not included in the classpath, so unless you set that value to true you will get following error2014-07-23 06:15:13,170 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.Pi peMapRunner not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1649) at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:1010) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not f ound at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1617) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1641) ... 8 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1523) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1615) ... 9 more 2014-07-23 06:15:13,175 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
-
Next step in the process is to create workflow.xml file like this, make sure to add
<file>mapper.py#mapper.py</file>
element in the workflow.xml, which takes care of putting the mapper.py and reducer.py in the sharedlib and creating symbolic link to these two files. -
Upload the streaming folder with all your changes on hdfs by executing following command
hdfs dfs -put streaming streaming
- You can trigger the oozie workflow by executing following command
oozie job -oozie http://localhost:11000/oozie -config streaming/job.properties -run
Using Apache Oozie to execute MapReduce jobs
I wanted to learn about how to automate MapReduce job using Oozie, so i decide to create Oozie workflow to invoke WordCount(HelloWorld) MapReduce program. I had to follow these steps
- FIrst thing that i did was to download the WordCount program source code by executing
This program does have maven script for building executable jar, so i usedgit clone https://github.com/sdpatil/HadoopWordCount3
mvn clean package
command to build Hadoop jar. -
After that i tried executing the program manually by using following following command
hadoop jar target/HadoopWordCount.jar sorttest.txt output/wordcount
- Now in order to use Oozie workflow you will have to create a particular folder structure on your machine
wordcount -- job.properties -- workflow.xml -- lib -- HadoopWordCount.jar
-
In the workcount folder create job.properties file like this, This file lets you pass parameters to your oozie workflow. Value of
nameNode
andjobTracker
represent the name node and job tracker location. In my case i am using cloudera vm with single ndoe so both these properties point to localhost. The value ofoozie.wf.application.path
is equal to HDFS path where you uploaded the wordcount folder created in step 3 -
Next define your Apache oozie workflow.xml file like this. In my case the workflow has single step which is to execute mapreduce job. I am
- mapred.mapper.new-api & mapred.reducer.new-api: Set this property to true if your using the new MapReduce API based on
org.apache.hadoop.mapreduce.*
classes - mapreduce.map.class: The fully qualified name of your mapper class
- mapreduce.reduce.class: The fully qualified name of your reducer class
- mapred.output.key.class: Fully qualified name of the output key class. This is same as parameter to
job.setOutputKeyClass()
in your driver class - mapred.output.value.class: Fully qualified name of the output value class. This is same as parameter to
job.setOutputValueClass()
in your driver class - mapred.input.dir: Location of your input file in my case i have sorttext.txt in hdfs://localhost/user/cloudera directory
- mapred.output.dir:Location of output file that will get generated. In my case i want output to go to hdfs://localhost/user/cloudera/output/wordcount directory
- mapred.mapper.new-api & mapred.reducer.new-api: Set this property to true if your using the new MapReduce API based on
-
Once your oozie workflow is ready upload the wordcount folder in HDFS by executing following command
hdfs dfs -put oozie wordcount
-
If it runs successfully you should see output generated inNow run your oozie workflow by executing following command from your wordcount directory oozie job -oozie http://localhost:11000/oozie -config job.properties -run
hdfs://localhost/user/cloudera/output/wordcount
directory
Enabling Oozie console on Cloudera VM 4.4.0 and executing examples
I am trying to learn about Apache Oozie, so i wanted to figure out how to use it in Cloudera 4.4.0 VM. When you go to the Oozie web console it shows a message saying that the Console is disabled. In order to enable the console i had to follow these steps
- Go to your Cloudera Manager, in that i went to the oozie configuration screen and i did check the
Enable Oozie Server Web Console
screen like this. As you can see in the description it says install ExtJS2.2 in/usr/lib/oozie/libext
- Next i did go to
/usr/lib/oozie/libext
directory and executed following command to download the ext-2.2.zip.
Since i am using CDH 4.4 i had to executewget 'http://extjs.com/deploy/ext-2.2.zip'
unzip ext-2.2.zip
to unzip the ext-2.2.zip - Last step was to restart the oozie service and now i could see the Oozie web console
- First thing for me was to find the
oozie-examples.tar.gz
file on my vm
I found it underfind / -name oozie-examples.tar.gz
/usr/share/doc/oozie-3.3.2+92/
directory. So i did untar it usingtar xvf oozie-examples.tar.gz
-
Then i had to make change in the job.properties to change value of namenode and jobTracker from localhost to localhost.localdomain get rid of
Error: E0901 : E0901: Namenode [localhost:8020] not allowed, not in Oozies whitelist
error.nameNode=hdfs://localhost.localdomain:8020 jobTracker=localhost.localdomain:8021 queueName=default examplesRoot=examples oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce outputDir=map-reduce
- After making changes in job.properties i did upload the examples folder to HDFS using following command
hdfs dfs -put examples examples
- The last step in the process was to actually run the mapreduce job in oozie by executing following command
oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
- Once the job was started i could see the progress using Oozie web console like this
Subscribe to:
Posts (Atom)