I wanted to learn how to use ElasticSearch for storing output of Pig Script. So i did create this simple text file that has names of cricket players and their role in the team and email id. Then i used Pig script for simply loading the text file into Elastic Search. I used following steps
- First i did create cricket.txt file that contains the crickets information like this
Virat Kohli batsman virat@bcci.com
MahendraSingh Dhoni batsman mahendra@bcci.com
Shikhar Dhawan batsman shikhar@bcci.com
-
The next step was to upload the cicket.txt file to HDFS /user/root directory
hdfs dfs -copyFromLocal cricket.txt /user/root/cricket.txt
- After that i did download the ElasticSearch Hadoop zip and i did expand it on my local. After that i decided to upload the whole elasticsearch-hadoop-2.0.0.RC1 directory to HDFS so that it is available from all the clusters
dfs dfs -copyFromLocal elasticsearch-hadoop-2.0.0.RC1/ /user/root/
-
Then i did create this cricketes.pig script which registers the ElasticSearch related jar files into pig as first step then, it loads the content of cricket.txt file into cricket variable and then stores that content into
pig/cricket
index on local host
/*
Register the elasticsearch hadoop related jar files
*/
REGISTER /user/root/elasticsearch-hadoop-2.0.0.RC1/dist/elasticsearch-hadoop-2.0.0.RC1.jar
REGISTER /user/root/elasticsearch-hadoop-2.0.0.RC1/dist/elasticsearch-hadoop-pig-2.0.0.RC1.jar
-- Load the content of /user/root/cricket.txt into Pig
cricket = LOAD '/user/root/cricket.txt' AS( fname:chararray, lname:chararray, skill: chararray, email: chararray);
DUMP cricket;
-- Store the content of cricket variable into instance of elastic search on local server, into pig/crciket index
STORE cricket into 'pig/cricket' USING org.elasticsearch.hadoop.pig.EsStorage;
After loading the pig script i did verify the content of the
pig/cricket
index on ES and i could see the content of text file like this
1 comment:
hi ,
where you configure the elasticsearch cluster end point?
for examle my elasticsearch cluster is accepting data at:
http://172.100.34.12:9200
How can i tell the plugin to connect to my cluster?
thanks
Post a Comment