How to control replication for a file stored in HDFS

In Hadoop you can control how many replicas of your file gets created. I wanted to try that out, so i tried different options. First option is to set up the replication factor in hdfs-site.xml, the settings in hdfs-site.xml apply to all the files (globals ettings)
  • dfs.replication: Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
  • dfs.replication.max: Maximal block replication.
  • dfs.namenode.replication.min: Minimal block replication.
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.replication.max</name>
<value>5</value>
</property>
<property>
<name>dfs.namenode.replication.min</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.logging.level</name>
<value>all</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/user/data</value>
</property>
</configuration>
view raw hdfs-site.xml hosted with ❤ by GitHub
After configuring global replication settings, i did restart hdfs daemons and when i executed hdfs fsck on one of the files to see the effect of setting replication and this is the output i got
You could also use the command line tools to set particular value of replication factor by executing hdfs setrep command like this.

hdfs dfs -setrep 5 /user/user/aesop.txt
Then i could verify the effect of replication settings like this
Then the last option is to set replication factor programmatically

No comments: