Table of Contents
Fetching ...

Bridging Local and Global Knowledge: Cascaded Mixture-of-Experts Learning for Near-Shortest Path Routing

Yung-Fu Chen, Anish Arora

Abstract

While deep learning models that leverage local features have demonstrated significant potential for near-optimal routing in dense Euclidean graphs, they struggle to generalize well in sparse networks where topological irregularities require broader structural awareness. To address this limitation, we train a Cascaded Mixture of Experts (Ca-MoE) to solve the all-pairs near-shortest path (APNSP) routing problem. Our Ca-MoE is a modular two-tier architecture that supports the decision-making for forwarder selection with lower-tier experts relying on local features and upper-tier experts relying on global features. It performs adaptive inference wherein the upper-tier experts are triggered only when the lower-tier ones do not suffice to achieve adequate decision quality. Computational efficiency is thus achieved by escalating model capacity only when necessitated by topological complexity, and parameter redundancy is avoided. Furthermore, we incorporate an online meta-learning strategy that facilitates independent expert fine-tuning and utilizes a stability-focused update mechanism to prevent catastrophic forgetting as new graph environments are encountered. Experimental evaluations demonstrate that Ca-MoE routing improves accuracy by up to 29.1% in sparse networks compared to single-expert baselines and maintains performance within 1%-6% of the theoretical upper bound across diverse graph densities.

Bridging Local and Global Knowledge: Cascaded Mixture-of-Experts Learning for Near-Shortest Path Routing

Abstract

While deep learning models that leverage local features have demonstrated significant potential for near-optimal routing in dense Euclidean graphs, they struggle to generalize well in sparse networks where topological irregularities require broader structural awareness. To address this limitation, we train a Cascaded Mixture of Experts (Ca-MoE) to solve the all-pairs near-shortest path (APNSP) routing problem. Our Ca-MoE is a modular two-tier architecture that supports the decision-making for forwarder selection with lower-tier experts relying on local features and upper-tier experts relying on global features. It performs adaptive inference wherein the upper-tier experts are triggered only when the lower-tier ones do not suffice to achieve adequate decision quality. Computational efficiency is thus achieved by escalating model capacity only when necessitated by topological complexity, and parameter redundancy is avoided. Furthermore, we incorporate an online meta-learning strategy that facilitates independent expert fine-tuning and utilizes a stability-focused update mechanism to prevent catastrophic forgetting as new graph environments are encountered. Experimental evaluations demonstrate that Ca-MoE routing improves accuracy by up to 29.1% in sparse networks compared to single-expert baselines and maintains performance within 1%-6% of the theoretical upper bound across diverse graph densities.
Paper Structure (30 sections, 6 equations, 7 figures, 1 table)

This paper contains 30 sections, 6 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The Ca-MoE framework for forwarder selection. The Router directs the graph sample $\langle O,D,v,nbr(v) \rangle$ to a selected lower-tier expert that predicts a $Q$-value for each neighbor node $u$ in $nbr(v)$, where $nbr(v)$ denotes the set of $v$'s neighbors. The candidate forwarder $\hat{u}$ for a near-shortest path from node $v$ to destination node $D$ is identified as a node in $nbr(v)$ with the highest $Q$-value. This candidate is validated by the Deferral rule: the sample is cascaded to the upper-tier expert only if the confidence is insufficient. Dashed lines denote conditional execution paths, illustrating selective activation based on the graph sample (in orange) and the adaptive activation based on the candidate forwarder for the graph sample (in purple).
  • Figure 2: Comparative performance of the Greedy-Tensile, Greedy-Lax, and Greedy-Spectral routing policies across different densities of Euclidean graphs. Each sub-figure illustrates a set of graphs where a different routing policy dominates: different graphs are enumerated on the x-axis; for each graph, the y-axis denotes the frequency with which each expert selects the best next forwarder for nodes $v$ given samples $\langle O,D,v,nbr(v) \rangle$ from that graph. In sub-figures (b) and (c), the choices for $v$ are constrained relative to the shortest paths from origin $O$ to $D$. Recall that a forwarder is determined to be the "best" if it yields the maximal optimal $Q$-value, $Q^*(O, D, v, u)$.
  • Figure 3: The online meta-learning pipeline. Upon the arrival of a new graph $G_t$, experts are selectively fine-tuned via a fine-tuning sampler. To prevent catastrophic forgetting, updated models ($H'$) are validated against a persistent Evaluation Dataset ($ED$) before replacing the current experts ($H^{t-1}$).
  • Figure 4: Comparative APNSP prediction accuracy on individual graph instances ($G_t$) for Ca-MoE against single-expert baselines. The x-axis enumerates specific graph instances, while the y-axis reports the routing accuracy.
  • Figure 5: Evolution of average APNSP prediction accuracy across the sequence of seen graphs $G_1..G_t$ shown in Figure \ref{['Eval_APNSP_accuracy_on_single_graphs']}. The figure illustrates the continual generalization capability of the framework. The x-axis represents the chronological sequence of arriving graphs, and the y-axis denotes the average accuracy on all graphs encountered up to time $t$.
  • ...and 2 more figures