-
First i changed the workflow.xml to take out
as-avrodatafile
and addedhive-import
option and i re-ran the workflow that looks like this When i did that the oozie workflow failed with following error7936 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies. 9202 [uber-SubtaskRunner] DEBUG org.apache.sqoop.mapreduce.db.DBConfiguration - Fetching password from job credentials store 9207 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation 9210 [uber-SubtaskRunner] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '1=1' and upper bound '1=1' 25643 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 931.1768 KB in 17.6994 seconds (52.6107 KB/sec) 25649 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 12435 records. 25649 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport - Hive.inputTable: customers 25650 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport - Hive.outputTable: customers 25653 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Execute getColumnInfoRawQuery : SELECT t.* FROM `customers` AS t LIMIT 1 25653 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - No connection paramenters specified. Using regular API for making connection. 25658 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Using fetchSize for next query: -2147483648 25658 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1 25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_id of type [4, 11, 0] 25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_fname of type [12, 45, 0] 25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_lname of type [12, 45, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_email of type [12, 45, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_password of type [12, 45, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_street of type [12, 255, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_city of type [12, 45, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_state of type [12, 45, 0] 25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager - Found column customer_zipcode of type [12, 45, 0] 25663 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.TableDefWriter - Create statement: CREATE TABLE IF NOT EXISTS `customers` ( `customer_id` INT, `customer_fname` STRING, `customer_lname` STRING, `customer_email` STRING, `customer_password` STRING, `customer_street` STRING, `customer_city` STRING, `customer_state` STRING, `customer_zipcode` STRING) COMMENT 'Imported by sqoop on 2016/12/22 21:18:39' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE 25664 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.TableDefWriter - Load statement: LOAD DATA INPATH 'hdfs://quickstart.cloudera:8020/user/cloudera/customers' INTO TABLE `customers` 25667 [uber-SubtaskRunner] INFO org.apache.sqoop.hive.HiveImport - Loading uploaded data into Hive 25680 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport - Using in-process Hive instance. 25683 [uber-SubtaskRunner] DEBUG org.apache.sqoop.util.SubprocessSecurityManager - Installing subprocess security manager Intercepting System.exit(1) <<< Invocation of Main class completed <<< Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1] Oozie Launcher failed, finishing Hadoop job gracefully Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000007-161222163830473-oozie-oozi-W/sqoop-52c0--sqoop/action-data.seq Oozie Launcher ends
-
As you can see from the log the Sqoop job was able to import data into HDFS in
/user/cloudera/customers
directory and i could actually see the data in the directory. But when Sqoop tried to create the table in hive it failed and the table did not get created in hive, this is the log statement that i am referring toCREATE TABLE IF NOT EXISTS `customers` ( `customer_id` INT, `customer_fname` STRING, `customer_lname` STRING, `customer_email` STRING, `customer_password` STRING, `customer_street` STRING, `customer_city` STRING, `customer_state` STRING, `customer_zipcode` STRING) COMMENT 'Imported by sqoop on 2016/12/22 21:18:39' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE
- So it seems the problem is Sqoop needs hive-site.xml so that it knows how to talk to hive service, for that first i search my sandbox to figure out where hive-site.xml is located, i executed following command to first find the hive-site.xml and then uploading it to HDFS
sudo find / -name hive-site.xml hdfs dfs -put /etc/hive/conf.dist/hive-site.xml
- After that i went back to the workflow.xml and modified it to look like this
Importing data from RDBMS into Hive using Sqoop and oozie (hive-import)
In the How to run Sqoop command from oozie entry i talked about how you can use Oozie and Sqoop to import data into HDFS. I wanted to change it to use sqoop's hive-import option, which in addition to importing data into HDFS also creats Hive table on top of the data. These are the steps that i followed
There's a lot of write my essay services throughout the net nowadays, so which to choose? The answer is - check everything for yourself. Click on a link I've provided to see one of the best in my opinion.
ReplyDeleteHi Your Blog Is Very Nice! Attractive. Content is Nice
ReplyDeleteStumagz is ultimate platform to release student magazine articles and engineering college news. It helps students to share their ideas in form of article.
stuMagz is an online platform that brings all the students and colleges together. Despite the fact that NBA accreditation says that every college must maintain a magazine to publish their content, there is no proper platform for students to expose themselves to the ecosystem and the lack of reach for the students to the various opportunities present in the market.There is a lack of connectivity between different colleges and universities.
The idea of stuMagz was born to bridge this gap and expand the scope for learning. It is a simple and efficient platform which provides every college and its students, a hassle free experience to publish their content and unleash their creativity.
Student Magazine Articles
Digital Campus Eco-System
Digital Stories In Hyderabad
Digital Classrooms In Hyderabad
Student Magazine Subscriptions
College Magazine Articles
College Fest Event
Top Engineering Colleges In India
For more Details Visit Us: Stumagz.com
Really nice blog post. provided a helpful information. I hope that you will post more updates like this Big Data Hadoop Online Training Bangalore
ReplyDeleteAll in one solution for readers business: supreme Package click on website and logo design package
ReplyDeleteThanks for sharing useful information. Keep on sharing.
ReplyDeleteWeb designer in canada
Digital marketing agency in canada