A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network
Nils Blach, Maciej Besta, Daniele De Sensi, Jens Domke, Hussein Harake, Shigang Li, Patrick Iff, Marek Konieczny, Kartik Lakhotia, Ales Kubicek, Marcel Ferrari, Fabrizio Petrini, Torsten Hoefler
TL;DR
This paper reports the first real-world deployment of Slim Fly, a diameter-2 interconnect designed to reduce cost and power while maintaining high performance. It introduces a novel high-performance multipath routing based on layered FatPaths that enables multiple disjoint, near-minimal paths without excessive deadlock-avoidance constraints, and demonstrates its effectiveness on a 200-node InfiniBand cluster. Through extensive evaluation against a 2-level Fat Tree and diverse workloads—including microbenchmarks, scientific HPC tests, and distributed deep learning proxies—the Slim Fly system achieves comparable or superior performance to Fat Trees while delivering strong scalability and substantial cost savings at large scales. The work further provides automated cabling scripts, correctness verification tooling, and portable routing architecture, facilitating practical deployment of low-diameter interconnects beyond the SF installation presented.
Abstract
Novel low-diameter network topologies such as Slim Fly (SF) offer significant cost and power advantages over the established Fat Tree, Clos, or Dragonfly. To spearhead the adoption of low-diameter networks, we design, implement, deploy, and evaluate the first real-world SF installation. We focus on deployment, management, and operational aspects of our test cluster with 200 servers and carefully analyze performance. We demonstrate techniques for simple cabling and cabling validation as well as a novel high-performance routing architecture for InfiniBand-based low-diameter topologies. Our real-world benchmarks show SF's strong performance for many modern workloads such as deep neural network training, graph analytics, or linear algebra kernels. SF outperforms non-blocking Fat Trees in scalability while offering comparable or better performance and lower cost for large network sizes. Our work can facilitate deploying SF while the associated (open-source) routing architecture is fully portable and applicable to accelerate any low-diameter interconnect.
