How to run Sqoop command from oozie

In the Importing data from Sqoop into Hive External Table with Avro encoding updated i blogged about how you can use sqoop to import data from RDBMS into Hadoop. I wanted to test if i can use Oozie for invoking Sqoop command and i followed these steps for doing that.
  1. First i tried executing this command from my command line on Hadoop cluster to make sure that i can actually run sqoop without any problem
    
    sqoop import --connect jdbc:mysql://localhost/test 
    --username root 
    --password cloudera 
    --table CUSTOMER 
    --as-avrodatafile
    
  2. Once the sqoop command was successfully executed i went back and deleted the CUSTOMER directory from HDFS to make sure that i could re-import data using following command
    
    hdfs dfs -rm -R CUSTOMER
    
  3. Next i went to Hue to create oozie workflow with single sqoop command that i had executed before
    But if your not using the Hue console you can create workflow.xml manually like this Also make sure to create job.properties file like this Take a look at Enabling Oozie console on Cloudera VM 4.4.0 and executing examples for information on how to run oozie job from command line
  4. Next when i ran the Oozie workflow, the job failed with following error, which indicates that Oozie does not have the MySQL JDBC driver.
    
    java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
     at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875)
     at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
     at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
     at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
     at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
     at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
     at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
     at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
     at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1846)
     at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1646)
     at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
     at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
     at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
     at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
     at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
     at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
     at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
     at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
     at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
     at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177)
     at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
    
  5. So first thing i did was to check if mysql driver is there in the oozie shared lib by executing following commands
    
    export OOZIE_URL=http://localhost:11000/oozie
    oozie admin -shareliblist sqoop
    
    I noticed that the mysql-connector-java.jar was not there in the list of shared libs for Oozie + sqoop
  6. Next step was to find the mysql-connector-java.jar in my sandbox that i could do by finding it like this
    
    sudo find / -name mysql*
    
    I found mysql-connector-java.jar on my local machine at /var/lib/sqoop/mysql-connector-java.jar
  7. I wanted to update the Oozie shared lib to include the mysql driver jar. So i executed following command to figure out the directory where the oozie sqoop shared lib is
    
    oozie admin -sharelibupdate
    
    From this output i got HDFS directory location for Oozie shared lib which is /user/oozie/share/lib/lib_20160406022812
  8. Then i used following two commands to first copy the db driver into the oozie shared lib and making sure it is accessible to other users hdfs -copyFromLocal /var/lib/sqoop/mysql-connector-java.jar /user/oozie/share/lib/sqoop/. hdfs dfs -chmod 777 /user/oozie/share/lib/sqoop/mysql-connector-java.jar
  9. Now the last step was to let Oozie know that it should reload the sharedlib and i did that by executing following two commands
    
    oozie admin -sharedlibupdate
    oozie admin -shareliblist sqoop | grep mysql*
    
    The second command queries oozie to get current list of shared jars and i could see mysql-connector-java.jar listed in it like this
When i re-executed the ooize job again this time it ran successfully.

No comments: