Showing posts with label pig. Show all posts
Showing posts with label pig. Show all posts

Reading content of ElasticSearch index into Pig Script

In the Using ElasticSearch for storing ouput of Pig Script , i built a sample for storing output of Pig Script into ElasticSearch. I wanted to try out the reverse, in which i wanted to use Index/Search Result in elastic search as input into Pig Script, so i built this sample
  1. First follow step 3 in the Using ElasticSearch for storing ouput of Pig Script to download and upload the ElasticSearch Hadoop jars into HDFS store.
  2. After that create a pig script like this, In this script first 2 lines are used to make the ElasticSearch Hadoop related jars available to Pig. Then the DEFINE statement is creating alias for org.elasticsearch.hadoop.pig.EsStorage and giving it a simple/user friendly name of ES. Then the 4th line is telling Pig to load the content of pig/cricket index on local elastic search into variable A. The last line is used for dumping content of variable A.
    
    REGISTER /user/root/elasticsearch-hadoop-2.0.0.RC1/dist/elasticsearch-hadoop-2.0.0.RC1.jar
    REGISTER /user/root/elasticsearch-hadoop-2.0.0.RC1/dist/elasticsearch-hadoop-pig-2.0.0.RC1.jar
    
    DEFINE ES org.elasticsearch.hadoop.pig.EsStorage;
    A = LOAD 'pig/cricket' USING ES;
    DUMP A;
    
After i executed the script i could see the output like this
Note: Before i got it to work i was using v = LOAD 'pig/cricket' USING org.elasticsearch.pig.EsStorage command to load the content of ES and it kept throwing the following error. I realized that i was using the wrong package name

grunt> v = LOAD 'pig/cricket' USING org.elasticsearch.pig.EsStorage;
2014-05-14 15:56:48,873 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve org.elasticsearch.pig.EsStorage using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /root/pig_1400106825043.log

Using ElasticSearch for storing ouput of Pig Script

I wanted to learn how to use ElasticSearch for storing output of Pig Script. So i did create this simple text file that has names of cricket players and their role in the team and email id. Then i used Pig script for simply loading the text file into Elastic Search. I used following steps
  1. First i did create cricket.txt file that contains the crickets information like this
    
    Virat Kohli batsman virat@bcci.com
    MahendraSingh Dhoni batsman mahendra@bcci.com
    Shikhar Dhawan batsman shikhar@bcci.com
    
  2. The next step was to upload the cicket.txt file to HDFS /user/root directory
    
    hdfs dfs -copyFromLocal cricket.txt /user/root/cricket.txt
    
  3. After that i did download the ElasticSearch Hadoop zip and i did expand it on my local. After that i decided to upload the whole elasticsearch-hadoop-2.0.0.RC1 directory to HDFS so that it is available from all the clusters
    
    dfs dfs -copyFromLocal elasticsearch-hadoop-2.0.0.RC1/ /user/root/
    
  4. Then i did create this cricketes.pig script which registers the ElasticSearch related jar files into pig as first step then, it loads the content of cricket.txt file into cricket variable and then stores that content into pig/cricket index on local host
    
    
    /*
    Register the elasticsearch hadoop related jar files
    */
    
    REGISTER /user/root/elasticsearch-hadoop-2.0.0.RC1/dist/elasticsearch-hadoop-2.0.0.RC1.jar
    REGISTER /user/root/elasticsearch-hadoop-2.0.0.RC1/dist/elasticsearch-hadoop-pig-2.0.0.RC1.jar
    
    -- Load the content of /user/root/cricket.txt into Pig
    cricket = LOAD '/user/root/cricket.txt' AS( fname:chararray, lname:chararray, skill: chararray, email: chararray);
    DUMP cricket;
    -- Store the content of cricket variable into instance of elastic search on local server, into pig/crciket index
    STORE cricket into 'pig/cricket' USING org.elasticsearch.hadoop.pig.EsStorage;
    
After loading the pig script i did verify the content of the pig/cricket index on ES and i could see the content of text file like this