SMaRTT: Sender-based Marked Rapidly-adapting Trimmed & Timed Transport
Tommaso Bonato, Abdul Kabbani, Ahmad Ghalayini, Anup Agarwal, Daniele De Sensi, Rong Pan, Costin Raiciu, Mark Handley, Mihai Brodschi, Timo Schneider, Nils Blach, Daniel Santos Ferreira Alves, Torsten Hoefler
TL;DR
SMaRTT tackles the challenge of high-throughput, low-latency congestion control for AI- and HPC-centric datacenters by combining sender-based window control with ECN, delay feedback, and optional packet trimming. The design introduces QuickAdapt for rapid reaction, Fair Increase for equitable sharing, and a tightly coupled load-balancer integration to optimize path utilization across multipath environments. Through extensive simulations and hardware experiments, SMaRTT outperforms Swift, RoCEv2, MPRDMA, and EQDS by up to ~50% in key workloads while improving fairness and convergence speed, and can augment receiver-based CC like EQDS to address fabric congestion. The work demonstrates practical deployability in Ultra Ethernet NSCC contexts, offering scalable per-flow state, low hardware requirements, and compatibility with existing network features such as ECMP, ECN, and trimming.
Abstract
With the rapid growth of artificial intelligence (AI) workloads in datacenters, the Ultra Ethernet Consortium (UEC) has defined a new high-performance transport layer to deliver the required performance at scale. A core component of this new standard is the Network Signal-based Congestion Control (NSCC) algorithm. This paper presents SMaRTT, the algorithm that forms the basis of the UEC NSCC specification. SMaRTT is a sender-based congestion control algorithm that systematically combines delay, Explicit Congestion Notification (ECN), and optional packet trimming into a cohesive state machine for fast, fair and precise window adjustments with seamless multipath support. At its core lies the novel QuickAdapt algorithm that accurately estimates and rapidly adapts to available capacity. Our evaluation shows that SMaRTT outperforms existing datacenter congestion control algorithms like Swift, RoCE, and MPRDMA by up to 50% and provides superior fairness, validating the design choices made in the UEC standard.
