Table of Contents
Fetching ...

SeqBalance: Congestion-Aware Load Balancing with no Reordering for RoCE

Huimin Luo, Jiao Zhang, Mingxuan Yu, Yongchen Pan, Tian Pan, Tao Huang

TL;DR

SeqBalance addresses the challenge of underutilized RDMA RoCE networks in AI-training workloads by introducing end-to-end, congestion-aware load balancing at ToR switches, while preserving kernel-bypass benefits and compatibility with commercial RNICs. The solution pairs a SeqBalance Shaper that splits RDMA WQEs into $N$ sub-WQEs with a congestion-driven, in-network routing protocol that uses Congestion Packets and a Congestion Table to avoid congested paths, all without altering the upper-layer software. The design achieves multi-path transmission without RDMA reordering, and hardware/software prototypes on Mellanox CX-6 RNICs and Intel ToFino switches demonstrate substantial reductions in average and 99th-percentile flow completion times across representative workloads and topologies. Overall, SeqBalance provides a practical, scalable approach to RDMA load balancing that leverages existing data-center infrastructure to improve link utilization and application performance.

Abstract

Remote Direct Memory Access (RDMA) is widely used in data center networks because of its high performance. However, due to the characteristics of RDMA's retransmission strategy and the traffic mode of AI training, current load balancing schemes for data center networks are unsuitable for RDMA. In this paper, we propose SeqBalance, a load balancing framework designed for RDMA. SeqBalance implements fine-grained load balancing for RDMA through a reasonable design and does not cause reordering problems. SeqBalance's designs are all based on existing commercial RNICs and commercial programmable switches, so they are compatible with existing data center networks. We have implemented SeqBalance in Mellanox CX-6 RNICs and Tofino switches. The results of hardware testbed experiments and large-scale simulations show that compared with existing load balancing schemes, SeqBalance improves 18.7% and 33.2% on average FCT and 99th percentile FCT.

SeqBalance: Congestion-Aware Load Balancing with no Reordering for RoCE

TL;DR

SeqBalance addresses the challenge of underutilized RDMA RoCE networks in AI-training workloads by introducing end-to-end, congestion-aware load balancing at ToR switches, while preserving kernel-bypass benefits and compatibility with commercial RNICs. The solution pairs a SeqBalance Shaper that splits RDMA WQEs into sub-WQEs with a congestion-driven, in-network routing protocol that uses Congestion Packets and a Congestion Table to avoid congested paths, all without altering the upper-layer software. The design achieves multi-path transmission without RDMA reordering, and hardware/software prototypes on Mellanox CX-6 RNICs and Intel ToFino switches demonstrate substantial reductions in average and 99th-percentile flow completion times across representative workloads and topologies. Overall, SeqBalance provides a practical, scalable approach to RDMA load balancing that leverages existing data-center infrastructure to improve link utilization and application performance.

Abstract

Remote Direct Memory Access (RDMA) is widely used in data center networks because of its high performance. However, due to the characteristics of RDMA's retransmission strategy and the traffic mode of AI training, current load balancing schemes for data center networks are unsuitable for RDMA. In this paper, we propose SeqBalance, a load balancing framework designed for RDMA. SeqBalance implements fine-grained load balancing for RDMA through a reasonable design and does not cause reordering problems. SeqBalance's designs are all based on existing commercial RNICs and commercial programmable switches, so they are compatible with existing data center networks. We have implemented SeqBalance in Mellanox CX-6 RNICs and Tofino switches. The results of hardware testbed experiments and large-scale simulations show that compared with existing load balancing schemes, SeqBalance improves 18.7% and 33.2% on average FCT and 99th percentile FCT.
Paper Structure (16 sections, 14 figures, 2 tables)

This paper contains 16 sections, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Flowlet characteristics in TCP and RDMA.
  • Figure 2: Overview of SeqBalance.
  • Figure 3: SeqBalance Shaper.
  • Figure 4: CQE Generation Scheme of RDMA.
  • Figure 5: CQE Generation Scheme after SeqBalance Shaper.
  • ...and 9 more figures