Concepedia

Publication | Closed Access

RDMA over Commodity Ethernet at Scale

433

Citations

31

References

2016

Year

TLDR

Over the past one and a half years, Microsoft has used RDMA over commodity Ethernet (RoCEv2) to support highly reliable, latency‑sensitive services. This paper describes the challenges encountered and the solutions devised to address them. We designed a DSCP‑based priority flow‑control mechanism to scale RoCEv2 beyond VLAN and built monitoring and management systems to ensure reliable operation. We resolved PFC‑induced deadlock, RDMA transport livelock, and NIC pause‑frame storms, showing that RoCEv2 can replace TCP for intra‑data‑center traffic with low latency, low CPU overhead, and high throughput.

Abstract

Over the past one and half years, we have been using RDMA over commodity Ethernet (RoCEv2) to support some of Microsoft's highly-reliable, latency-sensitive services. This paper describes the challenges we encountered during the process and the solutions we devised to address them. In order to scale RoCEv2 beyond VLAN, we have designed a DSCP-based priority flow-control (PFC) mechanism to ensure large-scale deployment. We have addressed the safety challenges brought by PFC-induced deadlock (yes, it happened!), RDMA transport livelock, and the NIC PFC pause frame storm problem. We have also built the monitoring and management systems to make sure RDMA works as expected. Our experiences show that the safety and scalability issues of running RoCEv2 at scale can all be addressed, and RDMA can replace TCP for intra data center communications and achieve low latency, low CPU overhead, and high throughput.

References

YearCitations

Page 1