Table of Contents
Fetching ...

Flowcut Switching: High-Performance Adaptive Routing with In-Order Delivery Guarantees

Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Abdulla Bataineh, David Hewson, Duncan Roweth, Torsten Hoefler

TL;DR

Flowcut switching addresses the challenge of maintaining in-order packet delivery under adaptive routing in data-center networks, particularly for RDMA RoCE and latency-sensitive transports. It achieves this by maintaining per-flow Flowcut state, creating new Flowcuts only when there are no in-flight packets, and employing a drainage mechanism with RTT-based signals to reroute congestion-affected flows. The paper presents three deployment variants (Full switch, Ingress-only, NIC-only) and demonstrates through simulations and Slingshot hardware experiments that Flowcut can yield up to 50% better flow completion times than ECMP and up to 40% better than Flowlet, while guaranteeing in-order delivery and tolerating failures with up to 5x improvement in tail scenarios. The approach is practical on commodity hardware with modest memory overhead and offers a flexible path to incremental deployment across switches or NICs, broadening the applicability of robust, in-order adaptive routing for modern data-center workloads.

Abstract

Network latency severely impacts the performance of applications running on supercomputers. Adaptive routing algorithms route packets over different available paths to reduce latency and improve network utilization. However, if a switch routes packets belonging to the same network flow on different paths, they might arrive at the destination out-of-order due to differences in the latency of these paths. For some transport protocols like TCP, QUIC, and RoCE, out-of-order (OOO) packets might cause large performance drops or significantly increase CPU utilization. In this work, we propose flowcut switching, a new adaptive routing algorithm that provides high-performance in-order packet delivery. Differently from existing solutions like flowlet switching, which are based on the assumption of bursty traffic and that might still reorder packets, flowcut switching guarantees in-order delivery under any network conditions, and is effective also for non-bursty traffic, as it is often the case for RDMA.

Flowcut Switching: High-Performance Adaptive Routing with In-Order Delivery Guarantees

TL;DR

Flowcut switching addresses the challenge of maintaining in-order packet delivery under adaptive routing in data-center networks, particularly for RDMA RoCE and latency-sensitive transports. It achieves this by maintaining per-flow Flowcut state, creating new Flowcuts only when there are no in-flight packets, and employing a drainage mechanism with RTT-based signals to reroute congestion-affected flows. The paper presents three deployment variants (Full switch, Ingress-only, NIC-only) and demonstrates through simulations and Slingshot hardware experiments that Flowcut can yield up to 50% better flow completion times than ECMP and up to 40% better than Flowlet, while guaranteeing in-order delivery and tolerating failures with up to 5x improvement in tail scenarios. The approach is practical on commodity hardware with modest memory overhead and offers a flexible path to incremental deployment across switches or NICs, broadening the applicability of robust, in-order adaptive routing for modern data-center workloads.

Abstract

Network latency severely impacts the performance of applications running on supercomputers. Adaptive routing algorithms route packets over different available paths to reduce latency and improve network utilization. However, if a switch routes packets belonging to the same network flow on different paths, they might arrive at the destination out-of-order due to differences in the latency of these paths. For some transport protocols like TCP, QUIC, and RoCE, out-of-order (OOO) packets might cause large performance drops or significantly increase CPU utilization. In this work, we propose flowcut switching, a new adaptive routing algorithm that provides high-performance in-order packet delivery. Differently from existing solutions like flowlet switching, which are based on the assumption of bursty traffic and that might still reorder packets, flowcut switching guarantees in-order delivery under any network conditions, and is effective also for non-bursty traffic, as it is often the case for RDMA.

Paper Structure

This paper contains 31 sections, 1 equation, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Optimal flowlet window for different workloads and network conditions. The optimal performance is calculated using the average flow completion time. Web search and Alibaba are traces with their distribution presented in Figure \ref{['fig:distr']}
  • Figure 2: Network example.
  • Figure 3: Flowcut switching example.
  • Figure 4: Maximum switch memory occupancy (MiB) of Flowcut switching on different configurations on a network with 2KiB MTU and 64 input and output ports per switch. (a) 1024 hosts on a 200Gb/s network, for different RTTs. (b) 1024 hosts, 5 microseconds maximum RTT, for different network bandwidths. (c) 800 Gb/s network, 5 microseconds maximum RTT, for different hosts count.
  • Figure 5: Maximum memory occupancy (MiB) of different algorithms ($10^4$ flows per host), on a 200Gb/s network with 2KiB MTU and connecting 1024 hosts. The memory requirement is for switches for all cases but the NIC only one where it is per NIC.
  • ...and 9 more figures