Importing data from Sqoop into Hive External Table with Avro encoding

I wanted to figure out how to import content of RDBMS table into Hive with Avro encoding, during this process i wanted to use external hive tables so that i have complete control over the location of files.
Note: I have a different/easier method for doing this in Importing data from Sqoop into Hive External Table with Avro encoding updated
First i did create following table in the mysql database which is on the same machine as that of my HortonWorks Sandbox
  1. First create CUSTOMER table like this in mysql
    contactid INTEGER NOT NULL ,       
    firstname VARCHAR(50),       
    lastname  VARCHAR(50),       
    email varchar(50) );
  2. After creating table add couple of records in it by executing following insert statement insert into customer values(1,'Sachin','Tendulark','');
  3. Next step is to run sqoop query that downloads records of the table into HDFS at /tmp/customer/sample. In real world you might want to download only first 10 records or so into Hive, because you need few sample records just to create avro schema
    sqoop import --connect jdbc:mysql://localhost/test --table CUSTOMER --username sqoop1 
    --password sqoop -m 1 --create-hive-table 
    --hive-table CONTACT --as-avrodatafile  --target-dir /tmp/customer/sample
  4. Running sqoop command it will dump records in HDFS, so first download the avro file generated by sqoop
    hdfs dfs -get /tmp/customer/sample/part-m-00000.avro
  5. Use the avro-tools-*.jar, to read schema of the file generated by sqoop. by executing following command
    java -jar avro-tools-1.7.5.jar getschema part-m-00000.avro > customer.avsc
    This is how the customer.avsc file looks like in my case
      "type" : "record",
      "name" : "CUSTOMER",
      "doc" : "Sqoop import of CUSTOMER",
      "fields" : [ {
        "name" : "contactid",
        "type" : [ "int", "null" ],
        "columnName" : "contactid",
        "sqlType" : "4"
      }, {
        "name" : "firstname",
        "type" : [ "string", "null" ],
        "columnName" : "firstname",
        "sqlType" : "12"
      }, {
        "name" : "lastname",
        "type" : [ "string", "null" ],
        "columnName" : "lastname",
        "sqlType" : "12"
      }, {
        "name" : "email",
        "type" : [ "string", "null" ],
        "columnName" : "email",
        "sqlType" : "12"
      } ],
      "tableName" : "CUSTOMER"
  6. Next step is to upload the avro schema file that you created in the last step back to HDFS, in my case i had HDFS folder called /tmp/customer/schema and i uploaded the avro schema file in it
    hdfs dfs -put customer.avsc /tmp/customer/schema/
  7. Now go to hive and execute the following command to define External Customer Hive table with avro schema defined in last step
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    LOCATION '/tmp/customer/data'
    TBLPROPERTIES ('avro.schema.url'='hdfs:///tmp/customer/schema/customer.avsc');
  8. Last step is to run sqoop again but this time with all the data in the external directory that Customer hive table is pointing to.
    sqoop import --connect jdbc:mysql://localhost/test --table CUSTOMER --username sqoop1 
    --password sqoop -m 1 --as-avrodatafile 
     --target-dir /tmp/customer/data --compression-codec snappy
Now if you run select query on the CUSTOMER table you should be able to get all the data that you see in your RDBMS


Anonymous said...

hi sunil,
while trying to run the commands as you described in your blog.
getting error as below
"Error: Unable to access jarfile avro-tools..."

can you please help me out to identify the reason or m i missing any configurations?

phanikb said...

You need to download avro-tools.jar for internet and need to move to the directory where you want to trigger that above command.

bharat batra said...

I am not able to read the data from customer no class found error org/serial/snappy/Snappy.
Anyone please advise.

wirawan0 said...

I thought AVRO format must be used in conjunction to its schema, not by itself. In other word, I was not expecting AVRO to be self-descripting by storing the schema inside the file. Am I wrong here? Since one of the commands above demonstrated that there is a tool to extract the schema out from the AVRO file.

Sameer said...

Nice Article. thanks!!

Srividya Parthasarathy said...

good one to poc avro...

Vamsi Krishna said...

what is the purpouse of [--create-hive-table and--hive-table CONTACT] in step 3 in sqoop command?we are not importing any data into hive table.Just we are generating schema file based on sample data.where you are using this contact[hive table] in your post.

Vamsi Krishna said...
This comment has been removed by the author.
Barış Taşkend said...
This comment has been removed by the author.

easy to understand .Thanks

srjwebsolutions said...

We are leading responsive website designing and development company in Noida.
We are offering mobile friendly responsive website designing, website development, e-commerce website, seo service and sem services in Noida.

Responsive Website Designing Company in Noida
Website Designing Company in Noida
SEO Services in Noida
SMO Services in Noida

Vikas Chaudhary said...

Battery Mantra is Authorized exide car battery dealer in Noida and Greater Noida. We are providing our service in Indirapuram, Delhi, Ashok Nagar.

Exide Battery Dealer in Noida
Battery Dealer in Noida
Authorized Battery Dealer in Noida
Car Battery Dealer in Noida
Car Battery Dealer
Exide Battery Dealer

EG MEDI said... is online medical store pharmacy in laxmi nagar Delhi. You can Order prescription/OTC medicines online. Cash on Delivery available. Free Home Delivery

Online Pharmacy in Delhi
Buy Online medicine in Delhi
Online Pharmacy in laxmi nagar
Buy Online medicine in laxmi nagar
Onine Medical Store in Delhi
Online Medical store in laxmi nagar
Online medicine store in delhi
online medicine store in laxmi nagar
Purchase Medicine Online
Online Pharmacy India
Online Medical Store

Mr Solution said...

How To Make Money From Youtube
Web Development
How To Approve Google Adsense Account
Seo Marketing
How To Make Free Website
Ecommerce Education
Mobile Software Course English/Urdu/Hindi

sheela rajesh said...

Thanks for providing such great and useful informations on your blog.update more data later.
Hadoop Training in Chennai
Big data training in chennai
big data training in velachery
JAVA Training in Chennai
Python Training in Chennai
Software testing training in chennai
Hadoop training in chennai
Big data training in chennai
big data training in chennai anna nagar