Publication | Open Access
Online aggregation for large MapReduce jobs
180
Citations
17
References
2011
Year
Cluster ComputingEngineeringDatabase SystemData ScienceDistributed Data AnalyticsCloud ComputingData IntegrationOnline AggregationParallel ProgrammingComputer ScienceDistributed Query ProcessingScalable ComputingParallel ComputingMap-reduceData ManagementMassive Data ProcessingBig Data
In online aggregation, a database system processes a user's aggregation query in an online fashion. At all times during processing, the system gives the user an estimate of the final query result, with the confidence bounds that become tighter over time. In this paper, we consider how online aggregation can be built into a MapReduce system for large-scale data processing. Given the MapReduce paradigm's close relationship with cloud computing (in that one might expect a large fraction of MapReduce jobs to be run in the cloud), online aggregation is a very attractive technology. Since large-scale cloud computations are typically pay-as-you-go, a user can monitor the accuracy obtained in an online fashion, and then save money by killing the computation early once sufficient accuracy has been obtained.
| Year | Citations | |
|---|---|---|
Page 1
Page 1