FNCC: Fast Notification Congestion Control in Data Center Networks
Jing Xu, Zhan Wang, Fan Yang, Ning Kang, Zhenlong Ma, Guojun Yuan, Guangming Tan, Ninghui Sun
TL;DR
FNCC addresses slow congestion response in data-center networks by delivering in-network telemetry via return-path ACKs, achieving sub-$RTT$ notification and faster rate adjustment. It adds a last-hop concurrency signal so receivers inform senders of concurrent congested flows, enabling quick convergence to a fair rate. The design comprises CP at switches, RP at senders, and ACK-generation at receivers, with practical hardware feasibility demonstrated and significant improvements in queue depth, pause-frame reduction, utilization, and flow-completion-time across 100–400 Gbps tests and large-scale simulations. This approach offers a practical, scalable enhancement to HPCC/DCQCN for RoCEv2-based data centers, reducing tail latency and improving fairness under diverse congestion scenarios.
Abstract
Congestion control plays a pivotal role in large-scale data centers, facilitating ultra-low latency, high bandwidth, and optimal utilization. Even with the deployment of data center congestion control mechanisms such as DCQCN and HPCC, these algorithms often respond to congestion sluggishly. This sluggishness is primarily due to the slow notification of congestion. It takes almost one round-trip time (RTT) for the congestion information to reach the sender. In this paper, we introduce the Fast Notification Congestion Control (FNCC) mechanism, which achieves sub-RTT notification. FNCC leverages the acknowledgment packet (ACK) from the return path to carry in-network telemetry (INT) information of the request path, offering the sender more timely and accurate INT. To further accelerate the responsiveness of last-hop congestion control, we propose that the receiver notifies the sender of the number of concurrent congested flows, which can be used to adjust the congested flows to a fair rate quickly. Our experimental results demonstrate that FNCC reduces flow completion time by 27.4% and 88.9% compared to HPCC and DCQCN, respectively. Moreover, FNCC triggers minimal pause frames and maintains high utilization even at 400Gbps.
