Publication | Closed Access
Scalable and robust key group size estimation for reducer load balancing in MapReduce
24
Citations
11
References
2013
Year
Unknown Venue
Cluster ComputingLoad Balancing (Computing)EngineeringWorkload ImbalanceMap-reduceDistributed Data AnalyticsData ScienceData MiningParallel ComputingData ManagementComputer EngineeringOptimal Packing AlgorithmComputer ScienceData-intensive ComputingScalable ComputingReduce-phase SkewCloud ComputingParallel ProgrammingReducer LoadMassive Data ProcessingBig Data
Modern parallel computing systems, such as MapReduce, often assume data values are uniformly distributed. However, in the real world, data is often highly skewed, which may cause workload imbalance among parallel running tasks. In this paper, we study the reduce-phase skew problem in MapReduce, where reduce tasks are often assgined imbalance load (in terms of key groups). We introduce a sketch-based data structure for capturing MapReduce key group size statistics and present an optimal packing algorithm which assigns the key groups to the reducers in a load balancing manner. We perform an empirical evaluation with several real and synthetic datasets over two distinct types of applications. The results show that our load balancing algorithm can strongly mitigate the reduce-phase skew. It can decrease the overall job completion time by 45.5% of the default settings in Hadoop and by 38.3% in comparison to the state-of-the-art solution.
| Year | Citations | |
|---|---|---|
Page 1
Page 1