Publication | Closed Access
Fault-Tolerant Scheduling for Real-Time Scientific Workflows with Elastic Resource Provisioning in Virtualized Clouds
128
Citations
24
References
2016
Year
Cluster ComputingEngineeringElastic Resource ProvisioningCloud Computing ArchitectureComputer ArchitectureSoftware EngineeringCloud Resource ManagementSystems EngineeringVirtualized CloudsParallel ComputingJob SchedulerCloud SchedulingVirtualized InfrastructureComputer EngineeringComputer ScienceReal-time Scientific WorkflowsScheduling AnalysisWorkflow ExecutionScientific Workflow SystemEdge ComputingCloud ComputingReal-time SystemsParallel ProgrammingScientific Workflow ApplicationsWorkflow Requests
Clouds are becoming an important platform for scientific workflow applications. However, with many nodes being deployed in clouds, managing reliability of resources becomes a critical issue, especially for the real-time scientific workflow execution where deadlines should be satisfied. Therefore, fault tolerance in clouds is extremely essential. The PB (primary backup) based scheduling is a popular technique for fault tolerance and has effectively been used in the cluster and grid computing. However, applying this technique for real-time workflows in a virtualized cloud is much more complicated and has rarely been studied. In this paper, we address this problem. We first establish a real-time workflow fault-tolerant model that extends the traditional PB model by incorporating the cloud characteristics. Based on this model, we develop approaches for task allocation and message transmission to ensure faults can be tolerated during the workflow execution. Finally, we propose a dynamic fault-tolerant scheduling algorithm, FASTER, for realtime workflows in the virtualized cloud. FASTER has three key features: 1) it employs a backward shifting method to make full use of the idle resources and incorporates task overlapping and VM migration for high resource utilization, 2) it applies the vertical/horizontal scaling-up technique to quickly provision resources for a burst of workflows, and 3) it uses the vertical scaling-down scheme to avoid unnecessary and ineffective resource changes due to fluctuated workflow requests. We evaluate our FASTER algorithm with synthetic workflows and workflows collected from the real scientific and business applications and compare it with six baseline algorithms. The experimental results demonstrate that FASTER can effectively improve the resource utilization and schedulability even in the presence of node failures in virtualized clouds.
| Year | Citations | |
|---|---|---|
Page 1
Page 1