Improving Data Locality of MapReduce by Scheduling in Homogeneous Computing Environments

Abstract

Data Locality is one of the critical factors to affect performance. This paper proposes a next-k-node scheduling (NKS) method to improve the data locality of map tasks. The method first calculates the probabilities of each map task, and then preferentially schedules the one with the highest probability. It generates low probabilities for the tasks which satisfy node locality with the nodes to issue requests, so it can reserve these tasks to these nodes. We have implemented the NKS method in hadoop-0.20.2. The experiment results have shown that the NKS method reduced 78% of the map tasks processed without node locality, reduced 77%of the network load caused by the tasks, and improved the performance of Hadoop MapReduce when comparing with the default task scheduling method in Hadoop. Obviously, the NKS method is very suitable for the homogeneous environment with network overload.

References

Page 1

	Year	Citations

Page 1