Publication | Closed Access
Improving Data Locality of MapReduce by Scheduling in Homogeneous Computing Environments
79
Citations
9
References
2011
Year
Unknown Venue
Cluster ComputingNetwork OverloadEngineeringComputer ArchitectureMap-reduceNode LocalityDistributed Data AnalyticsData ScienceParallel ComputingData ManagementJob SchedulerCloud SchedulingComputer ScienceData-intensive ComputingData LocalityScalable ComputingHomogeneous Computing EnvironmentsEdge ComputingCloud ComputingParallel ProgrammingMassive Data ProcessingBig Data
Data Locality is one of the critical factors to affect performance. This paper proposes a next-k-node scheduling (NKS) method to improve the data locality of map tasks. The method first calculates the probabilities of each map task, and then preferentially schedules the one with the highest probability. It generates low probabilities for the tasks which satisfy node locality with the nodes to issue requests, so it can reserve these tasks to these nodes. We have implemented the NKS method in hadoop-0.20.2. The experiment results have shown that the NKS method reduced 78% of the map tasks processed without node locality, reduced 77%of the network load caused by the tasks, and improved the performance of Hadoop MapReduce when comparing with the default task scheduling method in Hadoop. Obviously, the NKS method is very suitable for the homogeneous environment with network overload.
| Year | Citations | |
|---|---|---|
Page 1
Page 1