- Client The client submits the MapReduce job. Client is responsible for copying the map reduce related jars, configuration files and the distributed cache related files(jars, archives and files) into HDFS for distribution. It is also responsible for splitting the input into pieces and saving that information into HDFS for later use
- Application Master To manage lifecycle of application running on the cluster. When the map reduce framework starts application it creates application master on one of the worker nodes and the application master runs for the lifecycle of application/job
- Application master negotiates with the resource manager for cluster resources - described in terms of a number of containers, each with certain memory limit. Application master is also responsible for tracking the application progress such as status as well as counters across application.
- Resource Manager To manage the use of compute resources(Resources available for running map and reduce related tasks) across the cluster
- Node Manager node managers runs the application specific processes in the containers. The node manager ensures that the application does not use more resources than it has been allocated.
How MapReduce job in Yarn framework works
In case of Yarn MapReduce framework these are the main players involved in the process of running mapreduce application.