Importing data from RDBMS into Hive using Sqoop and oozie (hive-import)

In the How to run Sqoop command from oozie entry i talked about how you can use Oozie and Sqoop to import data into HDFS. I wanted to change it to use sqoop's hive-import option, which in addition to importing data into HDFS also creats Hive table on top of the data. These are the steps that i followed
  • First i changed the workflow.xml to take out as-avrodatafile and added hive-import option and i re-ran the workflow that looks like this When i did that the oozie workflow failed with following error
    
    7936 [uber-SubtaskRunner] WARN  org.apache.sqoop.mapreduce.JobBase  - SQOOP_HOME is unset. May not be able to find all job dependencies.
    9202 [uber-SubtaskRunner] DEBUG org.apache.sqoop.mapreduce.db.DBConfiguration  - Fetching password from job credentials store
    9207 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.db.DBInputFormat  - Using read commited transaction isolation
    9210 [uber-SubtaskRunner] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat  - Creating input split with lower bound '1=1' and upper bound '1=1'
    25643 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Transferred 931.1768 KB in 17.6994 seconds (52.6107 KB/sec)
    25649 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Retrieved 12435 records.
    25649 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport  - Hive.inputTable: customers
    25650 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport  - Hive.outputTable: customers
    25653 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Execute getColumnInfoRawQuery : SELECT t.* FROM `customers` AS t LIMIT 1
    25653 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - No connection paramenters specified. Using regular API for making connection.
    25658 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Using fetchSize for next query: -2147483648
    25658 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
    25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_id of type [4, 11, 0]
    25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_fname of type [12, 45, 0]
    25659 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_lname of type [12, 45, 0]
    25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_email of type [12, 45, 0]
    25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_password of type [12, 45, 0]
    25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_street of type [12, 255, 0]
    25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_city of type [12, 45, 0]
    25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_state of type [12, 45, 0]
    25660 [uber-SubtaskRunner] DEBUG org.apache.sqoop.manager.SqlManager  - Found column customer_zipcode of type [12, 45, 0]
    25663 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.TableDefWriter  - Create statement: CREATE TABLE IF NOT EXISTS `customers` ( `customer_id` INT, `customer_fname` STRING, `customer_lname` STRING, `customer_email` STRING, `customer_password` STRING, `customer_street` STRING, `customer_city` STRING, `customer_state` STRING, `customer_zipcode` STRING) COMMENT 'Imported by sqoop on 2016/12/22 21:18:39' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE
    25664 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.TableDefWriter  - Load statement: LOAD DATA INPATH 'hdfs://quickstart.cloudera:8020/user/cloudera/customers' INTO TABLE `customers`
    25667 [uber-SubtaskRunner] INFO  org.apache.sqoop.hive.HiveImport  - Loading uploaded data into Hive
    25680 [uber-SubtaskRunner] DEBUG org.apache.sqoop.hive.HiveImport  - Using in-process Hive instance.
    25683 [uber-SubtaskRunner] DEBUG org.apache.sqoop.util.SubprocessSecurityManager  - Installing subprocess security manager
    Intercepting System.exit(1)
    
    <<< Invocation of Main class completed <<<
    
    Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
    
    Oozie Launcher failed, finishing Hadoop job gracefully
    
    Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000007-161222163830473-oozie-oozi-W/sqoop-52c0--sqoop/action-data.seq
    
    Oozie Launcher ends
    
    
  • As you can see from the log the Sqoop job was able to import data into HDFS in /user/cloudera/customers directory and i could actually see the data in the directory. But when Sqoop tried to create the table in hive it failed and the table did not get created in hive, this is the log statement that i am referring to CREATE TABLE IF NOT EXISTS `customers` ( `customer_id` INT, `customer_fname` STRING, `customer_lname` STRING, `customer_email` STRING, `customer_password` STRING, `customer_street` STRING, `customer_city` STRING, `customer_state` STRING, `customer_zipcode` STRING) COMMENT 'Imported by sqoop on 2016/12/22 21:18:39' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE
  • So it seems the problem is Sqoop needs hive-site.xml so that it knows how to talk to hive service, for that first i search my sandbox to figure out where hive-site.xml is located, i executed following command to first find the hive-site.xml and then uploading it to HDFS sudo find / -name hive-site.xml hdfs dfs -put /etc/hive/conf.dist/hive-site.xml
  • After that i went back to the workflow.xml and modified it to look like this
Now when i ran the oozie workflow it was successful and i could query customer data

17 comments:

Robert Welain said...

There's a lot of write my essay services throughout the net nowadays, so which to choose? The answer is - check everything for yourself. Click on a link I've provided to see one of the best in my opinion.

Stumagz.com said...

Hi Your Blog Is Very Nice! Attractive. Content is Nice

Stumagz is ultimate platform to release student magazine articles and engineering college news. It helps students to share their ideas in form of article.

stuMagz is an online platform that brings all the students and colleges together. Despite the fact that NBA accreditation says that every college must maintain a magazine to publish their content, there is no proper platform for students to expose themselves to the ecosystem and the lack of reach for the students to the various opportunities present in the market.There is a lack of connectivity between different colleges and universities.

The idea of stuMagz was born to bridge this gap and expand the scope for learning. It is a simple and efficient platform which provides every college and its students, a hassle free experience to publish their content and unleash their creativity.

Student Magazine Articles

Digital Campus Eco-System

Digital Stories In Hyderabad

Digital Classrooms In Hyderabad

Student Magazine Subscriptions

College Magazine Articles

College Fest Event

Top Engineering Colleges In India

For more Details Visit Us: Stumagz.com

srjwebsolutions said...


We are leading responsive website designing and development company in Noida.
We are offering mobile friendly responsive website designing, website development, e-commerce website, seo service and sem services in Noida.

Responsive Website Designing Company in Noida
Website Designing Company in Noida
SEO Services in Noida
SMO Services in Noida

BatteryMantra No. 1online battery store said...

Battery Mantra is Authorized exide car battery dealer in Noida and Greater Noida. We are providing our service in Indirapuram, Delhi, Ashok Nagar.

Exide Battery Dealer in Noida
Battery Dealer in Noida
Authorized Battery Dealer in Noida
Car Battery Dealer in Noida
Car Battery Dealer
Exide Battery Dealer

Genxsoft said...

Egmedi.com is online medical store pharmacy in laxmi nagar Delhi. You can Order prescription/OTC medicines online. Cash on Delivery available. Free Home Delivery


Online Pharmacy in Delhi
Buy Online medicine in Delhi
Online Pharmacy in laxmi nagar
Buy Online medicine in laxmi nagar
Onine Medical Store in Delhi
Online Medical store in laxmi nagar
Online medicine store in delhi
online medicine store in laxmi nagar
Purchase Medicine Online
Online Pharmacy India
Online Medical Store

Tejuteju said...

Really nice blog post. provided a helpful information. I hope that you will post more updates like this Big Data Hadoop Online Training Bangalore

Abhi said...

Thanks for info....
Website development in Bangalore

Abhi said...


Thanks for info
Website design in Bangalore At RT Nagar

abhi said...

Nice blog Thank you.

Website Designing Internship Internship in Bangalore

Internship Program

Web Development Internship in Bangalore



Mark Steven said...

top Website development company
best Website development company
website development company
website development company in India
top website development company in India
best website development company in India
website development company in USA
top website development company in USA
best website development company in USA
website development company in Canada
top website development company in Canada
best website development company in Canada
website development company in Germany
top website development company in Germany
best website development company in Germany

Marry07 said...

If you are looking for a job oriented practical based workday course syllabus curriculum at Workday training institutes in India then you are arrived at the right place called Workday training institutes in ameerpet because this institute is quite popular for this wonderful course not just in Workday training institutes in Jaipur and also proving job assistance for Workday training institutes in Pune

Jackie Co Kad said...

Great Article
Cyber Security Projects

projects for cse

Networking Projects

JavaScript Training in Chennai

JavaScript Training in Chennai

The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

Catherine said...

All in one solution for readers business: supreme Package click on website and logo design package

Techsolve Solutions said...

Thanks for sharing useful information. Keep on sharing.

Web designer in canada
Digital marketing agency in canada

Geeta Limra said...

Very nice blog post. Thanks for sharing such a helpful article. Keep posting in the future also.

Web Development Company in Bangalore
Website Development Company in Bangalore

hari said...

nice Post thanks for the information, good information & very helpful for others.
Sailpoint Training
Looker Training

Anonymous said...

Take a look at MashMirror to find the best apps and tools. It has over 500 lists of best apps, software, and tools categorized in apps, tech, alternatives. MashMirror blogs provide the best information such as features, functionalities, and pricing of much different software, apps, and online tools. The single-column design lets you focus on what you are reading and find the answers to your curiosity. In addition to software, app, and SaaS lists, it has the latest information on Crypto, AI, and the latest tech.