- First thing is to find out the schema of the table in Avro and you can get that by executing following statement in hive
You will get output that looks something like this, it contains schema of the tableshow create table CUSTOMER;
CREATE EXTERNAL TABLE `CUSTOMER`( `contactid` int COMMENT 'from deserializer', `firstname` string COMMENT 'from deserializer', `lastname` string COMMENT 'from deserializer', `email` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs://sandbox.hortonworks.com:8020/tmp/customer/data' TBLPROPERTIES ( 'avro.schema.url'='hdfs:///tmp/customer/schema/customer.avsc', 'transient_lastDdlTime'='1431719666')
-
Copy the schema from last step and remove the part about format and table properties and replace it with part that highlighted in this code, execute this in Hive to create customer table in ORC format
CREATE EXTERNAL TABLE `CUSTOMER_ORC`( `contactid` int , `firstname` string , `lastname` string , `email` string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LOCATION 'hdfs://sandbox.hortonworks.com:8020/tmp/customer/data_orc' STORED AS ORC tblproperties ("orc.compress"="SNAPPY","orc.row.index.stride"="20000");
-
Last step is to copy data from avro table to ORC table, you can achieve that by using following command
insert into table CUSTOMER_ORC select * from customer;
Moving data from Avro to ORC files
In the Importing data from Sqoop into Hive External Table with Avro encoding i blogged about how to sqoop data from RDBMS into Hive. But i wanted to take it to next step by moving the data downloaded to ORC table. I followed these steps to achieve that
Subscribe to:
Post Comments (Atom)
1 comment:
Thanks for info....
Website development in Bangalore
Post a Comment