Publication | Closed Access
HaLoop
685
Citations
13
References
2010
Year
Cluster ComputingEngineeringHadoop Mapreduce FrameworkMap-reduceData ScienceData MiningData IntegrationParallel ComputingData ManagementHigh-performance Data AnalyticsWeb RankingKnowledge DiscoveryComputer ScienceDistributed Query ProcessingData-intensive ComputingCloud ComputingParallel ProgrammingMassive Data ProcessingBig Data
Large‑scale data mining requires highly scalable platforms; while MapReduce and Dryad are popular, they lack built‑in support for iterative programs common in data mining, web ranking, graph analysis, and model fitting. This paper presents HaLoop, a modified version of the Hadoop MapReduce framework designed to support these iterative applications. HaLoop extends MapReduce by adding loop‑aware scheduling, caching mechanisms, and was evaluated on real queries and datasets. Compared with Hadoop, HaLoop reduces query runtimes by 1.85× on average and shuffles only 4 % of the data between mappers and reducers.
The growing demand for large-scale data mining and data analysis applications has led both industry and academia to design new types of highly scalable data-intensive computing platforms. MapReduce and Dryad are two popular platforms in which the dataflow takes the form of a directed acyclic graph of operators. These platforms lack built-in support for iterative programs, which arise naturally in many applications including data mining, web ranking, graph analysis, model fitting, and so on. This paper presents HaLoop, a modified version of the Hadoop MapReduce framework that is designed to serve these applications. HaLoop not only extends MapReduce with programming support for iterative applications, it also dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching mechanisms. We evaluated HaLoop on real queries and real datasets. Compared with Hadoop, on average, HaLoop reduces query runtimes by 1.85, and shuffles only 4% of the data between mappers and reducers.
| Year | Citations | |
|---|---|---|
Page 1
Page 1