Importing data from Sqoop into Hive External Table with Avro encoding updated

In the Importing data from Sqoop into Hive External Table with Avro encoding i had details on how you can import a table from RDBMS into Hive using Sqoop in Avro format. In that blog i went through few steps to get the avsc file, but i realized there is easier way to do it following these steps
  1. First execute the sqoop import command like this, make sure that you pass --outdir schema as parameters to the sqoop import command, what that does is it generates the CUSTOMER.avsc and CUSTOMER.java in the schema directory on your local machine
    
    sqoop import --connect jdbc:mysql://localhost/test 
    --username root 
    --password cloudera 
    --table CUSTOMER 
    --as-avrodatafile 
    --outdir schema
    
  2. You can verify that CUSTOMER.avsc file got created as you expected by executing ls -ltrA schema
  3. Next create schema directory in HDFS by executing hdfs mkdir command like this
    
    hdfs dfs -mkdir /user/cloudera/schema
    
  4. Copy the CUSTOMER.avsc from your local schema directory to HDFS in schema directory by executing following command
    
    hdfs dfs -copyFromLocal schema/CUSTOMER.avsc /user/cloudera/schema/.
    
  5. Last step is to create Hive table with CUSTOMER.avsc as schema using following command
    
    CREATE EXTERNAL TABLE CUSTOMER
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    LOCATION '/user/cloudera/CUSTOMER'
    TBLPROPERTIES ('avro.schema.url'='/user/cloudera/schema/CUSTOMER.avsc');
    
Now if you go to hive and execute "SELECT * FROM CUSTOMER;" query then you should see 1 record in it like this

No comments: