Hadoop Distributed File System (HDFS) is a popular cloud storage system that can scale up easily to meet the increasing demand for more storage capacity. In HDFS, files are divided into fixed-size blocks, which are then replicated and randomly stored on many DataNodes to prevent data loss. It can be easily observed that the random nature of the default block placement strategy may lead to a load imbalance state among the DataNodes. Although HDFS has a built-in utility to achieve load balancing, it comes at the cost of a reduced system performance owing to moving blocks around. In this paper, we take a holistic approach to achieve load balancing by considering all situations that may influence the load-balancing state. We designed a new role named BalanceNode to help in matching heavy-loaded and light-loaded DataNodes, so those light-loaded nodes can share part of the load from heavy-loaded ones. We also designed a better block placement strategy to make the storage load as balanced as possible in the first place. The simulation results show that our approach can achieve better load-balancing state than with existing algorithms.
International Journal of Web and Grid Services, 13(4), pp.448 - 466