- In order to use a file with DistributedCache API, it has to available on either
hdfs:// or http://URL, that is accessible to all the cluster members. So first step was to upload the file that you are interested in into HDFS, in my case i used following command to copy the GoeLite2-City.mmdb file to hdfs.
hdfs dfs -copyFromLocal GeoLite2-City.mmdb /GeoLite2-City.mmdb
Next step is to change the Driver class and add
job.addCacheFile(new URI("hdfs://localhost:9000/GeoLite2-City.mmdb#GeoLite2-City.mmdb"));call, this call takes the hdfs url of the file that you just uploaded to HDFS and passes it to DistributedCache class. The
#GeoLite2-City.mmdbis used here to tell Hadoop that it should create a symbolic link to this file
Now in your Mapper class you can read the
GeoLite2-City.mmdbusing normal File API