Exporting data from Hive table to RDBMS

In the Importing data from RDBMS into Hadoop using sqoop i blogged about how to import data from RDBMS to Hive, but now i wanted to figure out how to export data from Hive back to RDBMS, Sqoop has export feature that allows you to export data from Hadoop directory(CSV files in a directory) to RDBMS, I wanted to try exporting data from sqoop so first i created a simple contact_hive table and populated some data in it, then i used sqoop to export the content of contact_hive table into contact table in MySQL, i followed these steps, if you already have a hive table populated then you can skip first 5 steps and go to step 6.
  1. Create contacthive.csv file which has simple data with 4 columns separated by comma
    
    1,MahendraSingh,Dhoni,mahendra@bcci.com
    2,Virat,Kohali,virat@bcci.com
    5,Sachin,Tendulkar,sachin@bcci.com
    
  2. Upload the contacthive.csv that you created in last step in HDFS at /tmp folder using following command
    
    hdfs dfs -put contacthive.csv /tmp
    
  3. Define a contact_hive table that will have 4 columns, contactId, firstName, lastName and email, execute this command in hive console
    
    CREATE TABLE contact_hive(contactId Int, firstName String, lastName String, email String) row format delimited fields terminated by "," stored as textfile;
    
  4. In this step populate the contact_hive table that you created in the last step with the data from contacthive.csv file created in step 1. Execute this command in Hive console to populate contact_hive table
    
    LOAD DATA INPATH  "/tmp/contacthive.csv" OVERWRITE INTO TABLE contact_hive;
    
  5. Since i am using Hive managed table, it will move the contacthive.csv file to Hive managed directory in case of Hortonworks that directory is /apps/hive/warehouse, You can verify that by executing following command on HDFS
    
    hdfs dfs -ls /apps/hive/warehouse/contact_hive
    
  6. Before you export data into RDBMS, you will have to create the table in mysql, use following command to create the CONTACT table in mysql.
    
    
    CREATE TABLE CUSTOMER (
          contactid INTEGER NOT NULL ,
          firstname VARCHAR(50),
          lastname  VARCHAR(50),
          email varchar(50)
    );
    
  7. Now last step is to execute sqoop export command that exports data from hive/hdfs directory to database
    
    sqoop export --connect jdbc:mysql://localhost/test --table CONTACT --export-dir /apps/hive/warehouse/contact_hive
    

1 comment:

identifymecnu said...

Thank you for the clear steps Sunil.
Could you please also post how to export a parquet file from HDFS to RDBMS.

I have a directory with parquet files( generated from spark operations) and the corresponding metadata files. How do I specify sqoop that it is a parquet file and to load using the schema