Showing posts with label hdfs-replication. Show all posts
Showing posts with label hdfs-replication. Show all posts

How to control replication for a file stored in HDFS

In Hadoop you can control how many replicas of your file gets created. I wanted to try that out, so i tried different options. First option is to set up the replication factor in hdfs-site.xml, the settings in hdfs-site.xml apply to all the files (globals ettings)
  • dfs.replication: Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
  • dfs.replication.max: Maximal block replication.
  • dfs.namenode.replication.min: Minimal block replication.
After configuring global replication settings, i did restart hdfs daemons and when i executed hdfs fsck on one of the files to see the effect of setting replication and this is the output i got
You could also use the command line tools to set particular value of replication factor by executing hdfs setrep command like this.

hdfs dfs -setrep 5 /user/user/aesop.txt
Then i could verify the effect of replication settings like this
Then the last option is to set replication factor programmatically

Setting replication factor for a file programmatically

I wanted to figure out how to set/change replication factor for a file stored in HDFS, so i built this sample program, which has 3 methods
  • read(): This method takes a file path as input and print its content to system console
  • write(): This method takes a file path as input and open it for writing. It takes user input and writes that to the HDFS file
  • setReplication(): This method takes file path and replication factor as input and sets the replication factor for the given file path
Once you build the jar you can execute it by using command like this

java -jar target/HelloHDFS-jar-with-dependencies.jar read <hdfsfile_path>
Ex. In my case i have aesop.txt at hdfs://localhost/user/user/aesop.txt so i can print it by using following command

java -jar target/HelloHDFS-jar-with-dependencies.jar read aesop.txt