Publication | Open Access
Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters
75
Citations
15
References
2016
Year
Unknown Venue
Pfs LevelCluster ComputingStorage PerformanceEngineeringComputer ArchitectureParallel StorageHigh Performance ComputingDisk StorageParallel ComputingParallel File SystemData ManagementJob SchedulerHybrid Hpc WorkloadComputer EngineeringScheduling (Computing)Computer ScienceBurst BuffersEdge ComputingCloud ComputingParallel ProgrammingIn-storage Computing
The economics of flash vs. disk storage is driving HPC centers to incorporate faster solid-state burst buffers into the storage hierarchy in exchange for smaller parallel file system (PFS) bandwidth. In systems with an underprovisioned PFS, avoiding I/O contention at the PFS level will become crucial to achieving high computational efficiency. In this paper, we propose novel batch job scheduling techniques that reduce such contention by integrating I/O awareness into scheduling policies such as EASY backfilling. We model the available bandwidth of links between each level of the storage hierarchy (i.e., burst buffers, I/O network, and PFS), and our I/O-aware schedulers use this model to avoid contention at any level in the hierarchy. We integrate our approach into Flux, a next-generation resource and job management framework, and evaluate the effectiveness and computational costs of our I/O-aware scheduling. Our results show that by reducing I/O contention for underprovisioned PFSes, our solution reduces job performance variability by up to 33% and decreases I/O-related utilization losses by up to 21%, which ultimately increases the amount of science performed by scientific workloads.
| Year | Citations | |
|---|---|---|
Page 1
Page 1