Publication | Open Access
Hoplite
15
Citations
32
References
2021
Year
Unknown Venue
Cluster ComputingCollective Communication EfficiencyEngineeringDistributed ComputingCloud ComputingDynamic WorkloadsDistributed EnvironmentSystems EngineeringParallel ProgrammingDistributed SystemsComputer ScienceDistributed Data ProcessingParallel ComputingDistributed ModelDistributed Processing
Task-based distributed frameworks (e.g., Ray, Dask, Hydro) have become increasingly popular for distributed applications that contain asynchronous and dynamic workloads, including asynchronous gradient descent, reinforcement learning, and model serving. As more data-intensive applications move to run on top of task-based systems, collective communication efficiency has become an important problem. Unfortunately, traditional collective communication libraries (e.g., MPI, Horovod, NCCL) are an ill fit, because they require the communication schedule to be known before runtime and they do not provide fault tolerance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1