Finding out the blocks used for storing file in HDFS

The HDFS stores files by breaking them into blocks of 64 MB in size (default block size but you can change it). It stores these blocks across one or more disks/machines, Unlike filesystem for a single disk, a file in HDFS that is smaller than single block does not occupy a fully block's worth of underlying storage. Map tasks in Map Reduce operate on one block at a time (InputSplit takes care of assigning different map task to each of teh block). Replication is handled at block level, what it means is HDFS will replicate the block to different machine and if one of the block is corrupted or one of the machine where the block is stored is down it can read that from different machine, in case of corrupted unreachable block HDFS will take care of replication of the block to bring the replication factor back to the normal level. Some applications may choose to set a high replication factor for blocks in popular file to spread the read load on the cluster I wanted to figure out how the file gets stored in HDFS(/user/user/aesop.txt file as example), so i tried these steps on my local hadoop single-node cluster.
  1. First i looked at my hadoop-site.xml to find out value of dfs.data.dir element, which points to where the data is stored. In my case its /home/user/data directory
    
    <configuration>
      <property>
             <name>dfs.replication</name>
             <value>1</value>
      </property>
     <property>
             <name>dfs.data.dir</name>
             <value>/home/user/data</value>
      </property> 
    </configuration>
    
  2. There is a file /user/user/aesop.txt stored in my HDFS and i wanted to see where the file is stored so i did execute hdfs fsck /user/user/aesop.txt -blocks -files command to get list of blocks where this file is located
    It is stored in BP-1574103969-127.0.1.1-1403309876533:blk_1073742129_1306
  3. When i looked under /home/user/data/current directory i saw BP-1574103969-127.0.1.1-1403309876533 which gives me first part of the block name, and it has bunch of subdirectories
  4. The next part was to find blk_1073742129_1306 which is second part of the block name so i did search for it without _1306 the last part and i found .meta file and a normal file that contains the actual content of aesop.txt, when i opened it i could see the content of the file like this
Note: I found this on my local hadoop, but havent seen this method in documentation so it might not work for you. But the good part is as long as your HDFS commands are working you dont need to worry about this.

6 comments:

Abhishek Singh said...

were you able to read the meta file? does it contain the physical disk memory address in which the file data is stored?

Karthika Shree said...

Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
Hadoop Training in Chennai

Stumagz.com said...

Hi Your Blog Is Very Nice! Attractive. Content is Nice

Stumagz is ultimate platform to release student magazine articles and engineering college news. It helps students to share their ideas in form of article.

stuMagz is an online platform that brings all the students and colleges together. Despite the fact that NBA accreditation says that every college must maintain a magazine to publish their content, there is no proper platform for students to expose themselves to the ecosystem and the lack of reach for the students to the various opportunities present in the market.There is a lack of connectivity between different colleges and universities.

The idea of stuMagz was born to bridge this gap and expand the scope for learning. It is a simple and efficient platform which provides every college and its students, a hassle free experience to publish their content and unleash their creativity.

Student Magazine Articles

Digital Campus Eco-System

Digital Stories In Hyderabad

Digital Classrooms In Hyderabad

Student Magazine Subscriptions

College Magazine Articles

College Fest Event

Top Engineering Colleges In India

For more Details Visit Us: Stumagz.com

srjwebsolutions said...

We are leading responsive website designing and development company in Noida.
We are offering mobile friendly responsive website designing, website development, e-commerce website, seo service and sem services in Noida.

Responsive Website Designing Company in Noida
Website Designing Company in Noida
SEO Services in Noida
SMO Services in Noida

Vikas Chaudhary said...

Battery Mantra is Authorized exide car battery dealer in Noida and Greater Noida. We are providing our service in Indirapuram, Delhi, Ashok Nagar.

Exide Battery Dealer in Noida
Battery Dealer in Noida
Authorized Battery Dealer in Noida
Car Battery Dealer in Noida
Car Battery Dealer
Exide Battery Dealer

EG MEDI said...

Egmedi.com is online medical store pharmacy in laxmi nagar Delhi. You can Order prescription/OTC medicines online. Cash on Delivery available. Free Home Delivery


Online Pharmacy in Delhi
Buy Online medicine in Delhi
Online Pharmacy in laxmi nagar
Buy Online medicine in laxmi nagar
Onine Medical Store in Delhi
Online Medical store in laxmi nagar
Online medicine store in delhi
online medicine store in laxmi nagar
Purchase Medicine Online
Online Pharmacy India
Online Medical Store