In the
Saving complex object in elasticsearch as output of MapReduce program entry, i talked about how to use ElasticSearch for storing output of the MapReduce job. In that blog i was creating Contact records that look like this in elasticSearch
{
"lastName":"Tendulkar",
"address":[
{
"country":"India\t",
"addressLine1":"1 Main Street",
"city":"Mumbai"
}
],
"firstName":"Sachin",
"dateOfBirth":"1973-04-24"
}
I wanted to figure out how to use ElasticSearch as input for MapReduce program, so i decided to create a MapReduce program that reads the contact Index and generates output on how many players are coming from a city. You can download the sample program from
here
This is how my MapReduce program looks like, you can run the driver program with 2 arguments ex.
hadoop/contact file:///home/user/output/
first is name of the ElasticSearch Index/type and second is the output directory where the output will get written.
This program has 3 main components
- MRInputDriver: In the Driver program you have to set
es.nodes
entry pointing to address of your elasticsearch installation and value of es.resource
is name of the ElasticSearch index/type name. Then i am setting job.setInputFormatClass(EsInputFormat.class);
, which sets EsInputFormat class as the input reader, it takes care of reading the records from ElasticSearch
- MRInputMapper: The Mapper class sets
Object
as value of both Key and Value type. ElasticSearch Hadoop framework reads the record from ElasticSearch and passes id as key(Text) and the content of value is object of MapWritable
class that represents the record stored in elasticsearch. Once i have the value, i am reading address from it and mapper writes City name as key and value 1.
- MRInputReducer: The reducer is pretty simple it gets called with name of the city as key and
Iterable
of values, this is very similar to reducer in WordCount.
After running the program i could see output being generated like this
Bangalore 2
Delhi 1
Mumbai 1
Ranchi 1
Great example.. thanks. The doc is very confusing, this is a nice example.
ReplyDeleteThis comment has been removed by the author.
ReplyDeletecould you please tell me what to extra needs to set in driver program to connect to elastic search which have VIP configuration and running with https protocol.
ReplyDeleteVIP has actual hosts running elasticsearch with 9200 port.
Hello,
ReplyDeleteIt was so nice article on MapReduce Program. I was really satisified by seeing this article. Keep Blogging....
Thanks for info....
ReplyDeleteWebsite development in Bangalore
Informative power bi training
ReplyDeleteThis information is very useful and attractive. For those who need this information, it's very informative and understandable for those all. In Extern Labs, we have professional Web designers for intuitive website designs. Extern Labs is also a website design company.You can hire a dedicated Web designer to design an interactive website with innovative ideas. Thanks for this information. website design company
ReplyDelete