Using ElasticSearch for storing ouput of Pig Script

I wanted to learn how to use ElasticSearch for storing output of Pig Script. So i did create this simple text file that has names of cricket players and their role in the team and email id. Then i used Pig script for simply loading the text file into Elastic Search. I used following steps
  1. First i did create cricket.txt file that contains the crickets information like this
    
    Virat Kohli batsman virat@bcci.com
    MahendraSingh Dhoni batsman mahendra@bcci.com
    Shikhar Dhawan batsman shikhar@bcci.com
    
  2. The next step was to upload the cicket.txt file to HDFS /user/root directory
    
    hdfs dfs -copyFromLocal cricket.txt /user/root/cricket.txt
    
  3. After that i did download the ElasticSearch Hadoop zip and i did expand it on my local. After that i decided to upload the whole elasticsearch-hadoop-2.0.0.RC1 directory to HDFS so that it is available from all the clusters
    
    dfs dfs -copyFromLocal elasticsearch-hadoop-2.0.0.RC1/ /user/root/
    
  4. Then i did create this cricketes.pig script which registers the ElasticSearch related jar files into pig as first step then, it loads the content of cricket.txt file into cricket variable and then stores that content into pig/cricket index on local host
    
    
    /*
    Register the elasticsearch hadoop related jar files
    */
    
    REGISTER /user/root/elasticsearch-hadoop-2.0.0.RC1/dist/elasticsearch-hadoop-2.0.0.RC1.jar
    REGISTER /user/root/elasticsearch-hadoop-2.0.0.RC1/dist/elasticsearch-hadoop-pig-2.0.0.RC1.jar
    
    -- Load the content of /user/root/cricket.txt into Pig
    cricket = LOAD '/user/root/cricket.txt' AS( fname:chararray, lname:chararray, skill: chararray, email: chararray);
    DUMP cricket;
    -- Store the content of cricket variable into instance of elastic search on local server, into pig/crciket index
    STORE cricket into 'pig/cricket' USING org.elasticsearch.hadoop.pig.EsStorage;
    
After loading the pig script i did verify the content of the pig/cricket index on ES and i could see the content of text file like this

1 comment:

  1. hi ,

    where you configure the elasticsearch cluster end point?

    for examle my elasticsearch cluster is accepting data at:

    http://172.100.34.12:9200

    How can i tell the plugin to connect to my cluster?

    thanks

    ReplyDelete