An HDFS cluster has two types of nodes operating in master-worker pattern
- NameNode: Manages the filesystem's directory structure and meta data for all the files. This information is persisted on local disk in the form of 2 files
- fsimage This is master copy of the metadata for the file system.
- edits: This file stores changes(delta/modifications) made to the meta information. In new version of hadoop (I am using 2.4) there would be multiple edit files(per transaction) that get created which store the changes made to meta.
In addition to this the data node also has mapping of blocks to the datanode where the block is persisted but, that information does not get persisted to the disk. Instead data ndoes send list of blocks that they have to Namenode on startup and Namenode keeps is in memory
The name node filesystem metadata is served entirely from RAM for fast lookup and retrieval and thus places a cap on how much metadata the name node can handle.
-
Secondary namenode: The job of secondary namenode is to merge the copy of fsimage and edits file for primary Namenode. So the basic issue is its very CPU consuming to take the fsimage and apply all the edits to it, so that work is delegated to secondary namenode. The secondary namenode downloads the edits file from primary and applies/merges it with fsimage and then sends it back to primary.
- DataNde: This is workhorse daemon that is responsible for storing and retrieving blocks of data. This daemon is also responsible for maintaining block report(List of blocks that are stored on that datanode). It sends a heart beat to Namenode at regular interval(1 hr) and as part of the heart beat it also sends block report
There are two ways to start the daemons necessary for HDFS one is you can start they individually using
start <daemontype> ex. start namenode
or you can start all of them using
start-dfs
nice explanation....
ReplyDeletearen't jobtracker and tasktracker also daemons?
ReplyDelete
ReplyDeleteI appreciate your work on Hadoop. It's such a wonderful read on Hadoop tutorial. Keep sharing stuffs like this. I am also educating people on similar Hadoop so if you are interested to know more you can watch this Hadoop tutorial:-https://www.youtube.com/watch?v=1jMR4cHBwZE
Thanks for sharing the descriptive information on Big Data Hadoop Tutorial. It’s really helpful to me since I'm taking Big Data Hadoop Training. Keep doing the good work and if you are interested to know more on Big Data Hadoop Tutorial, do check this Hadoop tutorial.https://www.youtube.com/watch?v=nuPp-TiEeeQ&
ReplyDelete