Exporting data from Hive table to RDBMS

In the Importing data from RDBMS into Hadoop using sqoop i blogged about how to import data from RDBMS to Hive, but now i wanted to figure out how to export data from Hive back to RDBMS, Sqoop has export feature that allows you to export data from Hadoop directory(CSV files in a directory) to RDBMS, I wanted to try exporting data from sqoop so first i created a simple contact_hive table and populated some data in it, then i used sqoop to export the content of contact_hive table into contact table in MySQL, i followed these steps, if you already have a hive table populated then you can skip first 5 steps and go to step 6.
  1. Create contacthive.csv file which has simple data with 4 columns separated by comma
  2. Upload the contacthive.csv that you created in last step in HDFS at /tmp folder using following command
    hdfs dfs -put contacthive.csv /tmp
  3. Define a contact_hive table that will have 4 columns, contactId, firstName, lastName and email, execute this command in hive console
    CREATE TABLE contact_hive(contactId Int, firstName String, lastName String, email String) row format delimited fields terminated by "," stored as textfile;
  4. In this step populate the contact_hive table that you created in the last step with the data from contacthive.csv file created in step 1. Execute this command in Hive console to populate contact_hive table
    LOAD DATA INPATH  "/tmp/contacthive.csv" OVERWRITE INTO TABLE contact_hive;
  5. Since i am using Hive managed table, it will move the contacthive.csv file to Hive managed directory in case of Hortonworks that directory is /apps/hive/warehouse, You can verify that by executing following command on HDFS
    hdfs dfs -ls /apps/hive/warehouse/contact_hive
  6. Before you export data into RDBMS, you will have to create the table in mysql, use following command to create the CONTACT table in mysql.
          contactid INTEGER NOT NULL ,
          firstname VARCHAR(50),
          lastname  VARCHAR(50),
          email varchar(50)
  7. Now last step is to execute sqoop export command that exports data from hive/hdfs directory to database
    sqoop export --connect jdbc:mysql://localhost/test --table CONTACT --export-dir /apps/hive/warehouse/contact_hive


identifymecnu said...

Thank you for the clear steps Sunil.
Could you please also post how to export a parquet file from HDFS to RDBMS.

I have a directory with parquet files( generated from spark operations) and the corresponding metadata files. How do I specify sqoop that it is a parquet file and to load using the schema

Ananthi S said...

Great and interesting article to read.. i Gathered more useful and new information from this article.thanks a lot for sharing this article to us..

best big data training center in Chennai | best big data hadoop training and certification