Note: I have a different/easier method for doing this in Importing data from Sqoop into Hive External Table with Avro encoding updated
First i did create following table in the mysql database which is on the same machine as that of my HortonWorks Sandbox
- First create CUSTOMER table like this in mysql
CREATE TABLE CUSTOMER ( contactid INTEGER NOT NULL , firstname VARCHAR(50), lastname VARCHAR(50), email varchar(50) );
- After creating table add couple of records in it by executing following insert statement
insert into customer values(1,'Sachin','Tendulark','sachin@gmail.com');
-
Next step is to run sqoop query that downloads records of the table into HDFS at /tmp/customer/sample. In real world you might want to download only first 10 records or so into Hive, because you need few sample records just to create avro schema
sqoop import --connect jdbc:mysql://localhost/test --table CUSTOMER --username sqoop1 --password sqoop -m 1 --create-hive-table --hive-table CONTACT --as-avrodatafile --target-dir /tmp/customer/sample
- Running sqoop command it will dump records in HDFS, so first download the avro file generated by sqoop
hdfs dfs -get /tmp/customer/sample/part-m-00000.avro
- Use the avro-tools-*.jar, to read schema of the file generated by sqoop. by executing following command
This is how the customer.avsc file looks like in my casejava -jar avro-tools-1.7.5.jar getschema part-m-00000.avro > customer.avsc
{ "type" : "record", "name" : "CUSTOMER", "doc" : "Sqoop import of CUSTOMER", "fields" : [ { "name" : "contactid", "type" : [ "int", "null" ], "columnName" : "contactid", "sqlType" : "4" }, { "name" : "firstname", "type" : [ "string", "null" ], "columnName" : "firstname", "sqlType" : "12" }, { "name" : "lastname", "type" : [ "string", "null" ], "columnName" : "lastname", "sqlType" : "12" }, { "name" : "email", "type" : [ "string", "null" ], "columnName" : "email", "sqlType" : "12" } ], "tableName" : "CUSTOMER" }
-
Next step is to upload the avro schema file that you created in the last step back to HDFS, in my case i had HDFS folder called
/tmp/customer/schema
and i uploaded the avro schema file in ithdfs dfs -put customer.avsc /tmp/customer/schema/
- Now go to hive and execute the following command to define External Customer Hive table with avro schema defined in last step
CREATE EXTERNAL TABLE CUSTOMER ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/tmp/customer/data' TBLPROPERTIES ('avro.schema.url'='hdfs:///tmp/customer/schema/customer.avsc');
-
Last step is to run sqoop again but this time with all the data in the external directory that Customer hive table is pointing to.
sqoop import --connect jdbc:mysql://localhost/test --table CUSTOMER --username sqoop1 --password sqoop -m 1 --as-avrodatafile --target-dir /tmp/customer/data --compression-codec snappy
hi sunil,
ReplyDeletewhile trying to run the commands as you described in your blog.
getting error as below
"Error: Unable to access jarfile avro-tools..."
can you please help me out to identify the reason or m i missing any configurations?
You need to download avro-tools.jar for internet and need to move to the directory where you want to trigger that above command.
ReplyDeleteI am not able to read the data from customer no class found error org/serial/snappy/Snappy.
ReplyDeleteAnyone please advise.
I thought AVRO format must be used in conjunction to its schema, not by itself. In other word, I was not expecting AVRO to be self-descripting by storing the schema inside the file. Am I wrong here? Since one of the commands above demonstrated that there is a tool to extract the schema out from the AVRO file.
ReplyDeleteNice Article. thanks!!
ReplyDeletegood one to poc avro...
ReplyDeletewhat is the purpouse of [--create-hive-table and--hive-table CONTACT] in step 3 in sqoop command?we are not importing any data into hive table.Just we are generating schema file based on sample data.where you are using this contact[hive table] in your post.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteeasy to understand .Thanks
ReplyDeleteYou made some respectable points there. I appeared on the web for the difficulty and located most people will associate with along with your website share more
ReplyDeleteAi & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai
First class site, where did u come happening pondering the notification coarsely this posting?i've obsession in a portion of the articles essentially about the solicitation for your site now, and I in fact in pantomime of your style. thankful a million and connect with progress forward on the functioning delivery faithfulness.. https://crackdj.com/paragon-ntfs-crack/
ReplyDeleteCharming test for a weblog. i've been examining the net for amusement and arrived re your web page. shocking understandable. much gratitude to you a ton for sharing your knowledge! it is invigorating to look that specific people anyway supplement an undertaking into adjusting to their destinations. i'll be veritable to check affirm inside the works indeed unambiguous quickly.. Avast Premier Activation Code
ReplyDeleteFunny Cousin Quotes · Your cousin is the brother God knew your mother could not handle. · The craziness of your family is best understood by your cousins.. Frequently, Cousins Day Caption
ReplyDeleteAbortion Centre Delhi is a registered and leading abortion centre in India. At the clinic, we have the state-of-the-art clinical setup, sophisticated facilities, and guidance of Dr Ruchi Malhotra to offer abortion services to terminate unwanted pregnancies. At first, we interact with females willing to abort their pregnancies. Through interaction, we know the current pregnancy status and general health of our patients. After physical examinations and a few test reports, we suggest medical abortion or surgical abortion for the termination of their unwanted pregnancies. We give more value to the health and safety of our patients.
ReplyDeleteWe are a leading Website Development Company Delhi, offering custom web development services to businesses of all sizes.
ReplyDeleteDiscover a stunning collection of Buy Semi Precious Stones for sale - explore our exquisite range today and buy semi precious stones online. Transform your jewelry designs with premium quality semi precious stones.
ReplyDeleteLooking for high-quality Tata Aria Parts? Look no further than BP Impex! Visit us for a wide range of Tata Aria Parts, sourced from reliable manufacturers. Shop now for unbeatable prices and excellent customer service.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteLooking for high-quality Tata Parts India? Explore our wide range of Tata Parts for all models at competitive prices. Order now and get fast shipping!
ReplyDelete