Concepedia

Publication | Closed Access

Roar: A Router Microarchitecture for In-network Allreduce

11

Citations

23

References

2023

Year

Abstract

The allreduce operation is the most commonly used collective operation in distributed or parallel applications. It aggregates data collected from distributed hosts and broadcasts the aggregated result back to them. In-network computing can accelerate allreduce by offloading this operation into network devices. However, existing in-network solutions face the challenge of high throughput, performance of aggregating large message and producing repeatable results. In this work, we propose a simple and effective router microarchitecture for in-network allreduce, which uses an RDMA protocol to improve its throughput. We further discuss strategies to tackle the aforementioned challenges. Our approach not only shows advantages in comparison with the state-of-the-art in-network solutions, but also accelerates allreduce at a near-optimal level compared to host-based algorithms, as demonstrated through experiments.

References

YearCitations

Page 1