Apache Hadoop has been widely used in big data processing and distributed computations. In the Hadoop ecosystem, data are stored and managed by the Hadoop Distributed File System (HDFS), in which the NameNode machine is a single point of failure. Although HDFS Federation and HDFS High Availability solve the problem, it comes at significant cost of extra server hardware. Therefore, we aim at improving the availability of the NameNode service in a more cost-effective way. The primary innovation is the joint consideration of MapReduce jobs and the resulting HDFS operations. Specifically, we dynamically allocate a SubNameNode for each job in one of the existing TaskTrackers to provide the NameNode service. Since the load of the single NameNode is naturally distributed to the SubNameNodes, the failure rate of the NameNode machine can be reduced. Moreover, with SubNameNodes more local to the participating TaskTrackers, TaskTrackers can access the NameNode service more efficiently.
International Journal of Web and Grid Services 10(4), pp.319-337