Publication | Closed Access
The Case for Evaluating MapReduce Performance Using Workload Suites
430
Citations
12
References
2011
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureMap-reduceMapreduce WorkloadsData ScienceData-intensive PlatformParallel ComputingData ManagementHigh-performance Data AnalyticsWorkload SuitesComputer ScienceData-intensive ComputingPerformance ScalabilityEdge ComputingCloud ComputingParallel ProgrammingRich Workload CharacteristicsMassive Data ProcessingBig Data
MapReduce systems face growing challenges from data and computation diversity, and existing benchmarks lack realistic, workload‑specific performance insights needed for provisioning and managing large‑scale clusters. The paper argues for moving beyond traditional benchmarks, demonstrating that they miss key workload characteristics and proposing a framework to synthesize and run representative workloads. The authors analyze two production MapReduce traces to create a workload vocabulary and use it to design a framework that synthesizes realistic workloads. Using realistic workloads enables operators to pinpoint workload‑specific bottlenecks and scheduler choices, and the authors anticipate that workload suites will empower operators to tackle tasks currently beyond reach.
MapReduce systems face enormous challenges due to increasing growth, diversity, and consolidation of the data and computation involved. Provisioning, configuring, and managing large-scale MapReduce clusters require realistic, workload-specific performance insights that existing MapReduce benchmarks are ill-equipped to supply. In this paper, we build the case for going beyond benchmarks for MapReduce performance evaluations. We analyze and compare two production MapReduce traces to develop a vocabulary for describing MapReduce workloads. We show that existing benchmarks fail to capture rich workload characteristics observed in traces, and propose a framework to synthesize and execute representative workloads. We demonstrate that performance evaluations using realistic workloads gives cluster operator new ways to identify workload-specific resource bottlenecks, and workload-specific choice of MapReduce task schedulers. We expect that once available, workload suites would allow cluster operators to accomplish previously challenging tasks beyond what we can now imagine, thus serving as a useful tool to help design and manage MapReduce systems.
| Year | Citations | |
|---|---|---|
Page 1
Page 1