Publication | Closed Access
Morpheus: towards automated SLOs for enterprise clusters
172
Citations
44
References
2016
Year
Cluster ComputingEngineeringEnterprise ClustersPerformance PredictabilityCluster FootprintData ScienceComputing SystemsDistributed EnvironmentHigh Cluster UtilizationParallel ComputingData ManagementJob SchedulerCloud SchedulingScheduling (Computing)Computer ScienceScalable ComputingOperating SystemsDistributed ComputingDistributed MiddlewareCloud ComputingScheduling (Operating Systems)Real-time SystemsScheduling (Project Management)System SoftwareWorkload ManagementResource Optimization
Modern resource management frameworks for large-scale analytics leave unresolved the problematic tension between high cluster utilization and job's performance predictability--respectively coveted by operators and users. We address this in Morpheus, a new system that: 1) codifies implicit user expectations as explicit Service Level Objectives (SLOs), inferred from historical data, 2) enforces SLOs using novel scheduling techniques that isolate jobs from sharing-induced performance variability, and 3) mitigates inherent performance variance (e.g., due to failures) by means of dynamic reprovisioning of jobs. We validate these ideas against production traces from a 50k node cluster, and show that Morpheus can lower the number of deadline violations by 5× to 13×, while retaining cluster-utilization, and lowering cluster footprint by 14% to 28%. We demonstrate the scalability and practicality of our implementation by deploying Morpheus on a 2700-node cluster and running it against production-derived workloads.
| Year | Citations | |
|---|---|---|
Page 1
Page 1