Publication | Closed Access
Adaptive MapReduce using situation-aware mappers
67
Citations
32
References
2012
Year
Unknown Venue
Cluster ComputingEngineeringAdaptive MappersComputer ArchitecturePartial AggregatesMap-reduceDistributed Data AnalyticsData ScienceBalanced PartitionsData IntegrationParallel ComputingData ManagementAdaptive MapreduceHigh-performance Data AnalyticsComputer ScienceDistributed Query ProcessingData-intensive ComputingCloud ComputingParallel ProgrammingMassive Data ProcessingBig Data
We propose new adaptive runtime techniques for MapReduce that improve performance and simplify job tuning. We implement these techniques by breaking a key assumption of MapReduce that mappers run in isolation. Instead, our mappers communicate through a distributed meta-data store and are aware of the global state of the job. However, we still preserve the fault-tolerance, scalability, and programming API of MapReduce. We utilize these "situation-aware mappers" to develop a set of techniques that make MapReduce more dynamic: (a) Adaptive Mappers dynamically take multiple data partitions (splits) to amortize mapper start-up costs; (b) Adaptive Combiners improve local aggregation by maintaining a cache of partial aggregates for the frequent keys; (c) Adaptive Sampling and Partitioning sample the mapper outputs and use the obtained statistics to produce balanced partitions for the reducers. Our experimental evaluation shows that adaptive techniques provide up to 3x performance improvement, in some cases, and dramatically improve performance stability across the board.
| Year | Citations | |
|---|---|---|
Page 1
Page 1