How to use ElasticSearch as input for MapReduce program

In the Saving complex object in elasticsearch as output of MapReduce program entry, i talked about how to use ElasticSearch for storing output of the MapReduce job. In that blog i was creating Contact records that look like this in elasticSearch

{
   "lastName":"Tendulkar",
   "address":[
      {
         "country":"India\t",
         "addressLine1":"1 Main Street",
         "city":"Mumbai"
      }
   ],
   "firstName":"Sachin",
   "dateOfBirth":"1973-04-24"
}
I wanted to figure out how to use ElasticSearch as input for MapReduce program, so i decided to create a MapReduce program that reads the contact Index and generates output on how many players are coming from a city. You can download the sample program from here This is how my MapReduce program looks like, you can run the driver program with 2 arguments ex. hadoop/contact file:///home/user/output/ first is name of the ElasticSearch Index/type and second is the output directory where the output will get written. This program has 3 main components
  1. MRInputDriver: In the Driver program you have to set es.nodes entry pointing to address of your elasticsearch installation and value of es.resource is name of the ElasticSearch index/type name. Then i am setting job.setInputFormatClass(EsInputFormat.class);, which sets EsInputFormat class as the input reader, it takes care of reading the records from ElasticSearch
  2. MRInputMapper: The Mapper class sets Object as value of both Key and Value type. ElasticSearch Hadoop framework reads the record from ElasticSearch and passes id as key(Text) and the content of value is object of MapWritable class that represents the record stored in elasticsearch. Once i have the value, i am reading address from it and mapper writes City name as key and value 1.
  3. MRInputReducer: The reducer is pretty simple it gets called with name of the city as key and Iterable of values, this is very similar to reducer in WordCount.
After running the program i could see output being generated like this

Bangalore 2
Delhi 1
Mumbai 1
Ranchi 1

7 comments:

varix said...

Great example.. thanks. The doc is very confusing, this is a nice example.

vivek singh said...
This comment has been removed by the author.
Lakshminarayana said...

could you please tell me what to extra needs to set in driver program to connect to elastic search which have VIP configuration and running with https protocol.

VIP has actual hosts running elasticsearch with 9200 port.

gracylayla said...

Hello,
It was so nice article on MapReduce Program. I was really satisified by seeing this article. Keep Blogging....

Abhi said...

Thanks for info....
Website development in Bangalore

Vaibhav Raut said...

Informative power bi training

Extren Labs said...

This information is very useful and attractive. For those who need this information, it's very informative and understandable for those all. In Extern Labs, we have professional Web designers for intuitive website designs. Extern Labs is also a website design company.You can hire a dedicated Web designer to design an interactive website with innovative ideas. Thanks for this information. website design company