Stanford University Networking Seminar
In recent years, the usage of RDMA in datacenter networks has increased significantly, with RoCE (RDMA over Converged Ethernet) emerging as the canonical approach to deploying RDMA in Ethernet-based datacenters. RoCE NICs only achieve good performance when run over a lossless network, which is done through the use of Ethernet's Priority Flow Control (PFC) mechanism. However, PFC introduces significant problems, such as head-of-the-line blocking, congestion spreading, and occasional deadlocks.
In this work, we ask: is PFC fundamentally required for deploying RDMA over Ethernet, or is their use merely an artifact of the current RoCE NIC design? We find that while PFC is indeed needed for current RoCE NICs, it is unnecessary (and sometimes significantly harmful) when one updates RoCE NICs to a more appropriate (yet still feasible) design. Thus, our findings suggest that to avoid the many problems with PFC in RDMA datacenters, we should adopt this new RoCE NIC design.
Radhika Mittal is a Phd student in UC Berkeley advised by Prof. Sylvia Ratnasamy and Prof. Scott Shenker. While she is broadly interested in the computer networking, her work focuses on packet transport (scheduling and congestion control) across different scenarios.