Sunil's Notes: How to control replication for a file stored in HDFS

In Hadoop you can control how many replicas of your file gets created. I wanted to try that out, so i tried different options. First option is to set up the replication factor in hdfs-site.xml, the settings in hdfs-site.xml apply to all the files (globals ettings)

dfs.replication: Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
dfs.replication.max: Maximal block replication.
dfs.namenode.replication.min: Minimal block replication.

After configuring global replication settings, i did restart hdfs daemons and when i executed hdfs fsck on one of the files to see the effect of setting replication and this is the output i got

You could also use the command line tools to set particular value of replication factor by executing hdfs setrep command like this.


hdfs dfs -setrep 5 /user/user/aesop.txt

Then i could verify the effect of replication settings like this

Then the last option is to set replication factor programmatically

How to control replication for a file stored in HDFS

No comments: