Publication | Closed Access
Scarlett
284
Citations
26
References
2011
Year
Unknown Venue
Distributed File SystemCluster ComputingPopular Mapreduce FrameworksEngineeringData ScienceCloud ComputingData-intensive PlatformData IntegrationMassive Data ProcessingMap-reduceDistributed Data StoreData ManagementResilience Mapreduce FrameworksBig DataData Availability
To improve data availability and resilience MapReduce frameworks use file systems that replicate data uniformly. However, analysis of job logs from a large production cluster shows wide disparity in data popularity. Machines and racks storing popular content become bottlenecks; thereby increasing the completion times of jobs accessing this data even when there are machines with spare cycles in the cluster. To address this problem, we present Scarlett, a system that replicates blocks based on their popularity. By accurately predicting file popularity and working within hard bounds on additional storage, Scarlett causes minimal interference to running jobs. Trace driven simulations and experiments in two popular MapReduce frameworks (Hadoop, Dryad) show that Scarlett effectively alleviates hotspots and can speed up jobs by 20.2%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1