Concepedia

Publication | Closed Access

Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning

133

Citations

27

References

2020

Year

Abstract

We present Gandivafair, a distributed, fair share scheduler that balances conflicting goals of efficiency and fairness in GPU clusters for deep learning training (DLT). Gandivafair provides performance isolation between users, enabling multiple users to share a single cluster, thus, maximizing cluster efficiency. Gandivafair is the first scheduler that allocates cluster-wide GPU time fairly among active users.

References

YearCitations

Page 1