How to control replication for a file stored in HDFS

In Hadoop you can control how many replicas of your file gets created. I wanted to try that out, so i tried different options. First option is to set up the replication factor in hdfs-site.xml, the settings in hdfs-site.xml apply to all the files (globals ettings)
  • dfs.replication: Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
  • dfs.replication.max: Maximal block replication.
  • dfs.namenode.replication.min: Minimal block replication.
After configuring global replication settings, i did restart hdfs daemons and when i executed hdfs fsck on one of the files to see the effect of setting replication and this is the output i got
You could also use the command line tools to set particular value of replication factor by executing hdfs setrep command like this.

hdfs dfs -setrep 5 /user/user/aesop.txt
Then i could verify the effect of replication settings like this
Then the last option is to set replication factor programmatically

No comments: