Table of Contents
Fetching ...

SMaRTT: Sender-based Marked Rapidly-adapting Trimmed & Timed Transport

Tommaso Bonato, Abdul Kabbani, Ahmad Ghalayini, Anup Agarwal, Daniele De Sensi, Rong Pan, Costin Raiciu, Mark Handley, Mihai Brodschi, Timo Schneider, Nils Blach, Daniel Santos Ferreira Alves, Torsten Hoefler

TL;DR

SMaRTT tackles the challenge of high-throughput, low-latency congestion control for AI- and HPC-centric datacenters by combining sender-based window control with ECN, delay feedback, and optional packet trimming. The design introduces QuickAdapt for rapid reaction, Fair Increase for equitable sharing, and a tightly coupled load-balancer integration to optimize path utilization across multipath environments. Through extensive simulations and hardware experiments, SMaRTT outperforms Swift, RoCEv2, MPRDMA, and EQDS by up to ~50% in key workloads while improving fairness and convergence speed, and can augment receiver-based CC like EQDS to address fabric congestion. The work demonstrates practical deployability in Ultra Ethernet NSCC contexts, offering scalable per-flow state, low hardware requirements, and compatibility with existing network features such as ECMP, ECN, and trimming.

Abstract

With the rapid growth of artificial intelligence (AI) workloads in datacenters, the Ultra Ethernet Consortium (UEC) has defined a new high-performance transport layer to deliver the required performance at scale. A core component of this new standard is the Network Signal-based Congestion Control (NSCC) algorithm. This paper presents SMaRTT, the algorithm that forms the basis of the UEC NSCC specification. SMaRTT is a sender-based congestion control algorithm that systematically combines delay, Explicit Congestion Notification (ECN), and optional packet trimming into a cohesive state machine for fast, fair and precise window adjustments with seamless multipath support. At its core lies the novel QuickAdapt algorithm that accurately estimates and rapidly adapts to available capacity. Our evaluation shows that SMaRTT outperforms existing datacenter congestion control algorithms like Swift, RoCE, and MPRDMA by up to 50% and provides superior fairness, validating the design choices made in the UEC standard.

SMaRTT: Sender-based Marked Rapidly-adapting Trimmed & Timed Transport

TL;DR

SMaRTT tackles the challenge of high-throughput, low-latency congestion control for AI- and HPC-centric datacenters by combining sender-based window control with ECN, delay feedback, and optional packet trimming. The design introduces QuickAdapt for rapid reaction, Fair Increase for equitable sharing, and a tightly coupled load-balancer integration to optimize path utilization across multipath environments. Through extensive simulations and hardware experiments, SMaRTT outperforms Swift, RoCEv2, MPRDMA, and EQDS by up to ~50% in key workloads while improving fairness and convergence speed, and can augment receiver-based CC like EQDS to address fabric congestion. The work demonstrates practical deployability in Ultra Ethernet NSCC contexts, offering scalable per-flow state, low hardware requirements, and compatibility with existing network features such as ECMP, ECN, and trimming.

Abstract

With the rapid growth of artificial intelligence (AI) workloads in datacenters, the Ultra Ethernet Consortium (UEC) has defined a new high-performance transport layer to deliver the required performance at scale. A core component of this new standard is the Network Signal-based Congestion Control (NSCC) algorithm. This paper presents SMaRTT, the algorithm that forms the basis of the UEC NSCC specification. SMaRTT is a sender-based congestion control algorithm that systematically combines delay, Explicit Congestion Notification (ECN), and optional packet trimming into a cohesive state machine for fast, fair and precise window adjustments with seamless multipath support. At its core lies the novel QuickAdapt algorithm that accurately estimates and rapidly adapts to available capacity. Our evaluation shows that SMaRTT outperforms existing datacenter congestion control algorithms like Swift, RoCE, and MPRDMA by up to 50% and provides superior fairness, validating the design choices made in the UEC standard.
Paper Structure (53 sections, 3 equations, 17 figures, 4 algorithms)

This paper contains 53 sections, 3 equations, 17 figures, 4 algorithms.

Figures (17)

  • Figure 1: Comparison of SMaRTT with recent CC algorithms on an 8:1 oversubscribed fat tree running a permutation. The first number under each algorithm indicates the overall completion time, while the second is the time difference between the fastest and slowest flow.
  • Figure 2: High level block diagram of SMaRTT.
  • Figure 3: Permutation of SMaRTT for different load balancers.
  • Figure 4: Comparing ECN on egress and delay-based congestion signals during an incast.
  • Figure 5: cwnd evolution over time for three variants of SMaRTT. The red stars indicate when NACKs happen.
  • ...and 12 more figures