Using elasticsearch as external data store with apache hive

ElasticSearch has this feature in which you can configure Hive table that actually points to index in ElasticSearch. I wanted to learn how to use this feature so i followed these steps
  1. First i did create contact/contact index and type in ElasticSearch and i did insert 4 records in it like this
  2. Next i did download ElasticSearch Hadoop zip file on my Hadoop VM by executing following command
    
    wget http://download.elasticsearch.org/hadoop/elasticsearch-hadoop-2.0.0.RC1.zip
    
    I did expand the elasticsearch-hadoop-2.0.0.RC1.zip in the /root directory
  3. Next i had to start the hive console by executing following command, take a look at how i had to add elasticsearch-hadoop-2.0.0.RC1.jar to the aux.jars.path hive -hiveconf hive.aux.jars.path=/root/elasticsearch-hadoop-2.0.0.RC1/dist/elasticsearch-hadoop-2.0.0.RC1.jar
  4. Next i did define artists table in hive that points to contact index in the elasticsearch server like this
    
    CREATE EXTERNAL TABLE artists (
    fname STRING,
    lname STRING,
    email STRING)
    STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
    TBLPROPERTIES('es.resource' = 'contact/contact',
                  'es.index.auto.create' = 'false') ;
    
  5. Once the table is configured i could query it like any normal Hive table like this

3 comments:

Anonymous said...

Many thanks Sunil for this very clear & helpful post.

Anonymous said...

Hi,
This post is excellent and working for me.
Can you please guide me, how to store data into ElasticSearch using Hive.

Unknown said...

It was a very very helpful tutorial. thanks a lot