Publication | Closed Access
Cost-based Fault-tolerance for Parallel Data Processing
22
Citations
12
References
2015
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureMap-reduceData ScienceManagementData IntegrationParallel ComputingDifferent Fault-tolerance SchemesData ManagementParallel DatabaseComputer EngineeringComputer ScienceDistributed Query ProcessingCost-based Fault-toleranceQuery OptimizationParallel ProcessingCloud ComputingParallel Performance EvaluationParallel ProgrammingCost-based Fault-tolerance SchemeData-level ParallelismFine-grained Fault-tolerance SchemeMassive Data ProcessingBig Data
In order to deal with mid-query failures in parallel data engines (PDEs), different fault-tolerance schemes are implemented today: (1) fault-tolerance in parallel databases is typically implemented in a coarse-grained manner by restarting a query completely when a mid-query failure occurs, and (2) modern MapReduce-style PDEs implement a fine-grained fault-tolerance scheme, which either materializes intermediate results or implements a lineage model to recover from mid-query failures. However, neither of these schemes can efficiently handle mixed workloads with both short running interactive queries as well as long running batch queries nor do these schemes efficiently support a wide range of different cluster setups which vary in cluster size and other parameters such as the mean time between failures. In this paper, we present a novel cost-based fault-tolerance scheme which tackles this issue. Compared to the existing schemes, our scheme selects a subset of intermediates to be materialized such that the total query runtime is minimized under mid-query failures. Our experiments show that our cost-based fault-tolerance scheme outperforms all existing strategies and always selects the sweet spot for short- and long running queries as well as for different cluster setups.
| Year | Citations | |
|---|---|---|
Page 1
Page 1