Using DistributedCache with MapReduce job

In the Using third part jars and files in your MapReduce application(Distributed cache) entry i blogged about how to use Distributed Cache in Hadoop using command line option. But you can also have option of using DistributedCache API. You will have to use following steps to use DistributedCache programmatically In order to use it, first change your MapReduce Driver class to add job.addCacheFile()
  1. In order to use a file with DistributedCache API, it has to available on either hdfs:// or http:// URL, that is accessible to all the cluster members. So first step was to upload the file that you are interested in into HDFS, in my case i used following command to copy the GoeLite2-City.mmdb file to hdfs.
    
    hdfs dfs -copyFromLocal GeoLite2-City.mmdb /GeoLite2-City.mmdb
    
  2. Next step is to change the Driver class and add job.addCacheFile(new URI("hdfs://localhost:9000/GeoLite2-City.mmdb#GeoLite2-City.mmdb")); call, this call takes the hdfs url of the file that you just uploaded to HDFS and passes it to DistributedCache class. The #GeoLite2-City.mmdb is used here to tell Hadoop that it should create a symbolic link to this file
  3. Now in your Mapper class you can read the GeoLite2-City.mmdb using normal File API
When you use the distributed cache Hadoop first copies the file specified in the DistributedCache API on the machine executing task. You can view it by looking at the mapreduce temp directory like this.

5 comments:

Anonymous said...

Hi

I was trying to add the files to distributed cache and tried retrieving them in my Map class.

The issue that I faced was with the URL.

The MR job error out hitting file not found exception.

What I saw was weird.

The file that I passed had following name [for ex]
hdfs://localhost:9000/abc/xyz.txt

I used SOP and it was printed correctly in the mapper class. But when I used the same file name in my FileReader object's constructor, It was resolved like

hdfs:/localhost:9000/abc/xyz.txt

I tried adding an assitional / but it didn't worked out.

Can some one please help?

Anonymous said...

Hey thanks alot for such an informative article.....you saved my many hours....cheers!!!

srjwebsolutions said...

We are leading responsive website designing and development company in Noida.
We are offering mobile friendly responsive website designing, website development, e-commerce website, seo service and sem services in Noida.

Responsive Website Designing Company in Noida
Website Designing Company in Noida
SEO Services in Noida
SMO Services in Noida

Vikas Chaudhary said...

Battery Mantra is Authorized exide car battery dealer in Noida and Greater Noida. We are providing our service in Indirapuram, Delhi, Ashok Nagar.

Exide Battery Dealer in Noida
Battery Dealer in Noida
Authorized Battery Dealer in Noida
Car Battery Dealer in Noida
Car Battery Dealer
Exide Battery Dealer

EG MEDI said...

Egmedi.com is online medical store pharmacy in laxmi nagar Delhi. You can Order prescription/OTC medicines online. Cash on Delivery available. Free Home Delivery


Online Pharmacy in Delhi
Buy Online medicine in Delhi
Online Pharmacy in laxmi nagar
Buy Online medicine in laxmi nagar
Onine Medical Store in Delhi
Online Medical store in laxmi nagar
Online medicine store in delhi
online medicine store in laxmi nagar
Purchase Medicine Online
Online Pharmacy India
Online Medical Store