How reading and writing of files in HDFS works

Read Path
  1. The client program starts with Hadoop library jar and copy of cluster configuration data, that specifies the location of the name node.
  2. The client begins by contact the node node indicating the file it wants to read.
  3. The name node will validate clients identity, either by simply trusting client or using authentication protocol such as Kerberos.
  4. The client identity is verified against the owner and permission of the file.
  5. Namenode responds to the client with the first block ID and the list of data nodes on which a copy of the block can be found, sorted by their distance to the client, Distance to the client is measured according to Hadoop's rack topology
  6. With the block IDS and datanode hostnames, the client can now contact the most appropriate datanode directly and read the block data it needs. This process repeats until all the blocks in the file have been read or the client closes the file stream.
Write Path
  1. Client makes a request to open a file for wringing using the Hadoop FileSystem APIs.
  2. A request is sent to the name node to create the file metadata if the user has the necessary permission to do so. However, it initially has no associated blocks.
  3. Namenode responds to the client indicating that the request was successful and it should start writing data.
  4. The client library sends request to name node asking set of datanodes to which data should be written, it gets a list from name node
  5. The client makes connection to first data node, which in turn makes connection to second and second datanode makes connection to third.
  6. The client starts writing data to first data node, the first data node writes data to disk as well as to the input stream pointing to second data node. The second data node writes the data the disk and writes to the connection pointing to third data node and so on.
  7. Once client is finished writing it indicates closing of the stream that flushes data and writes to disk.

1 comment:

Anonymous said...

nice sunil.. very easy to understand ..

Thanks
Ripunjay Godhani
https://in.linkedin.com/in/ripunjaygodhani