A Highly Scalable LLM Clusters with Optical Interconnect
Xinchi Han, Yongxi Lv, Weihao Jiang, Shuyuan Zhang, Yingming Mao, Shizhen Zhao, ZhuoRan Liu, Zhuotao Liu, Peirui Cao, Ximeng Liu, Xinbing Wang
TL;DR
The paper tackles the challenge of designing an optimal physical topology for optical circuit switch (OCS)-based GPU clusters, aiming to maximize logical topology compatibility, cluster scalability, and topology engineering polynomial solvability. It introduces the Symmetric Integer Matrix Decomposition Theorem to reduce the ToE problem to a polynomial-time minimum-cost flow formulation and proposes Cross Wiring as a concrete topology that achieves full compatibility, full scalability, and polynomial solvability. The authors prove the optimality of Cross Wiring, discuss online ToE feasibility, and validate the approach through a 128-NPU testbed and large-scale trace-based simulations, showing up to 28.3% throughput gains and substantial reductions in maximum link utilization. The work provides a universal, hardware-agnostic pattern for future OCS architectures and demonstrates practical performance benefits for high-scale AI training in OC-based data centers.
Abstract
Recent years have witnessed the adoption of optical circuit switch (OCS) technology. How to design the physical topology, defined by the physical wiring between electrical switching equipments and the OCS, is fundamental to designing efficient OCS-based clusters. We identify three features to evaluate the quality of a physical topology design: logical topology compatibility, cluster scalability, and topology engineering polynomial-solvability. However, none of existing physical topologies has achieved these three features simultaneously. This paper explores designing an optimal physical topology that simultaneously maximizes all. We begin by analyzing the importance of these features in OCS-based cluster and examine the limitations of current designs. Leveraging a proposed \emph{Symmetric Integer Matrix Decomposition Theorem}, we outline a general approach for designing optimal physical topologies and introduce \textbf{Cross Wiring} as a concrete instantiation. The feasibility and advantages of Cross Wiring are verified through a 128-NPU testbed and large-scale real-trace-based simulations.
