Table of Contents
Fetching ...

To Reconfigure or Not to Reconfigure: Optimizing All-to-All Collectives in Circuit-Switched Photonic Interconnects

Anchengcheng Zhou, Vamsi Addanki, Maria Apostolaki

TL;DR

This work tackles the challenge of optimizing all-to-all communication on circuit-switched photonic interconnects by jointly selecting topology sequences and flow schedules while accounting for realistic reconfiguration costs. It introduces a matrix-based abstraction that expresses strategies as decompositions of the traffic matrix into sums of adjacency matrices and their powers, enabling a closed-form TotalCost and a lower bound on the optimum. By identifying a region of highly symmetric, high-expansion topologies and proposing a simple, contention-free scheduling approach, the method constructs near-optimal strategies without exhaustive search and demonstrates substantial gains. Empirical evaluation shows up to 44% reduction in completion time across diverse network sizes, topologies, and workloads, with a clear characterization of when reconfiguration is most beneficial and a quantified optimality gap.

Abstract

All-to-all collective communication is a core primitive in distributed machine learning and high-performance computing. At the server scale, the communication demands of these workloads are increasingly outstripping the bandwidth and energy limits of electrical interconnects, driving a growing interest in photonic interconnects. However, leveraging these interconnects for all-to-all communication is nontrivial. The core challenge lies in jointly optimizing a sequence of topologies and flow schedules, reconfiguring only when the transmission savings from traversing shorter paths outweigh the reconfiguration cost. Yet the search space of this joint optimization is enormous. Existing work sidesteps this challenge by making unrealistic assumptions on reconfiguration costs so that it is never or always worthwhile to reconfigure. In this paper, we show that any candidate sequence of topologies and flow schedules can be expressed as a sum of adjacency matrices and their powers. This abstraction captures the entire solution space and yields a lower bound on all-to-all completion time. Building on this formulation, we identify a family of topology sequences with strong symmetry and high expansion that admits bandwidth-efficient schedules, which our algorithm constructs with low computational overhead. Together, these insights allow us to efficiently construct near-optimal solutions, effectively avoiding enumeration of the combinatorial design space. Evaluation shows that our approach reduces all-to-all completion time by up to 44% on average across a wide range of network parameters, message sizes and workload types.

To Reconfigure or Not to Reconfigure: Optimizing All-to-All Collectives in Circuit-Switched Photonic Interconnects

TL;DR

This work tackles the challenge of optimizing all-to-all communication on circuit-switched photonic interconnects by jointly selecting topology sequences and flow schedules while accounting for realistic reconfiguration costs. It introduces a matrix-based abstraction that expresses strategies as decompositions of the traffic matrix into sums of adjacency matrices and their powers, enabling a closed-form TotalCost and a lower bound on the optimum. By identifying a region of highly symmetric, high-expansion topologies and proposing a simple, contention-free scheduling approach, the method constructs near-optimal strategies without exhaustive search and demonstrates substantial gains. Empirical evaluation shows up to 44% reduction in completion time across diverse network sizes, topologies, and workloads, with a clear characterization of when reconfiguration is most beneficial and a quantified optimality gap.

Abstract

All-to-all collective communication is a core primitive in distributed machine learning and high-performance computing. At the server scale, the communication demands of these workloads are increasingly outstripping the bandwidth and energy limits of electrical interconnects, driving a growing interest in photonic interconnects. However, leveraging these interconnects for all-to-all communication is nontrivial. The core challenge lies in jointly optimizing a sequence of topologies and flow schedules, reconfiguring only when the transmission savings from traversing shorter paths outweigh the reconfiguration cost. Yet the search space of this joint optimization is enormous. Existing work sidesteps this challenge by making unrealistic assumptions on reconfiguration costs so that it is never or always worthwhile to reconfigure. In this paper, we show that any candidate sequence of topologies and flow schedules can be expressed as a sum of adjacency matrices and their powers. This abstraction captures the entire solution space and yields a lower bound on all-to-all completion time. Building on this formulation, we identify a family of topology sequences with strong symmetry and high expansion that admits bandwidth-efficient schedules, which our algorithm constructs with low computational overhead. Together, these insights allow us to efficiently construct near-optimal solutions, effectively avoiding enumeration of the combinatorial design space. Evaluation shows that our approach reduces all-to-all completion time by up to 44% on average across a wide range of network parameters, message sizes and workload types.
Paper Structure (19 sections, 1 theorem, 8 equations, 14 figures)

This paper contains 19 sections, 1 theorem, 8 equations, 14 figures.

Key Result

Theorem 1

Fix integers $n\ge 2$, $k = 1$, and $d\ge 1$. Let $R>0$ denote the per-reconfiguration time and $T>0$ denote the transmission time per unit data per hop. For any valid strategy that performs exactly $d$ reconfigurations, the total completion time satisfies where $q=\left\lfloor\frac{n-1}{d}\right\rfloor$ and $u=(n-1)\bmod d$.

Figures (14)

  • Figure 1: Our approach maps the space of feasible solutions (strategies), i.e., all possible sequences of topologies and flow assignments that map each flow to one of these topologies that satisfies the collective but without lower-level routing details, per network-number of reconfiguration instance, identifies a structured region (characterized by topology sequences that exhibit strong symmetry and high expansion) in the space where the regional optimum is easy to find and close to the global optimum. Figure shown for illustration only (not to scale).
  • Figure 2: An example scale-up network with $n=8$ GPUs interconnected by $k=2$ optical switches (OSW).
  • Figure 3: An example strategy that reconfigures during the collective, which includes two topologies (reconfigurations) and schedules flows to run approximately evenly across the two topologies in specified rounds. Matrices below the topologies and rounds are illustrations of how they are captured (as adjacency matrices and their powers) in our abstraction.
  • Figure 4: An example strategy that reconfigures once at the beginning and sends all flows on this topology.
  • Figure 5: An example strategy that reconfigures $n-1$ times and sends all flows on direct circuits. (Not all rounds are illustrated.)
  • ...and 9 more figures

Theorems & Definitions (1)

  • Theorem 1: Lower bound on TotalCost(n,k,d)