Publication | Closed Access
Spotgres - parallel data analytics on Spot Instances
10
Citations
10
References
2015
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureFault ToleranceMap-reduceDistributed Data AnalyticsTypical Pde ArchitectureDatacenter-scale ComputingCluster TechnologyData ScienceSystems EngineeringParallel ComputingData ManagementHigh-performance Data AnalyticsEc2 Spot InstancesComputer EngineeringParallel Data AnalyticsComputer ScienceScalable ComputingEdge ComputingCloud ComputingParallel ProgrammingDistributed Data StoreNovel PdeBig Data
Market-based IaaS offers such as Amazon's EC2 Spot Instances represent a cost-efficient way to operate a cluster. Compared to traditional IaaS offers which follow a fixed pricing scheme, the per hour price of Spot Instances changes dynamically, whereas the Spot price is often significantly less when compared to On-demand and even the Reserved Instances. When deploying a Parallel Data-Processing Engine (PDE) on a cluster of Spot Instances a major obstacle is to find a bidding strategy that is optimal for a given workload and satisfies user constraints such as the maximal budget. Moreover, another obstacle is that existing PDEs implement rigid fault-tolerance schemes which do not adapt to different failure rates resulting from different bidding strategies. In this paper, we present a novel PDE called Spotgres that tackles these issues. Spotgres extends a typical PDE architecture by (1) a constraint-based bid advisor which finds an optimal cluster configuration (i.e., a set of bids on Spot Instances) and (2) a cost-based fault-tolerance scheme that takes various parameters (such as the mean time between failures and query statistics) into account to efficiently execute analytical queries over the set of Spot Instances that have a varying failure rate.
| Year | Citations | |
|---|---|---|
Page 1
Page 1