Table of Contents
Fetching ...

Traffic Engineering in Large-scale Networks with Generalizable Graph Neural Networks

Fangtong Zhou, Xiaorui Liu, Ruozhou Yu, Guoliang Xue

TL;DR

The paper tackles traffic engineering in large-scale networks by addressing scalability and generalization gaps in existing learning-based TE. It introduces TELGEN, a graph-neural-network framework that learns to imitate the steps of a classical interior-point method (IPM) by transforming TE into a graph-structured LP and aligning the GNN with IPM iterations. TELGEN employs a double-loop architecture with K outer iterations and J inner layers, and is trained with strong supervision using intermediate IPM updates, enabling topology-agnostic generalization. Empirical results show TELGEN achieves a near-optimal TE solution with gaps below 3%, up to 84% solving-time savings over IPM, and substantial speedups over TEAL and HARP, while generalizing to networks up to 20× larger and to unseen demand distributions. These findings position TELGEN as a scalable, generalizable building block for automated, high-performance TE in modern WANs and cloud networks.

Abstract

Traffic engineering (TE) in large-scale computer networks has become a fundamental yet challenging problem, owing to the swift growth of global-scale cloud wide-area networks or backbone low-Earth-orbit satellite constellations. To address the scalability issue of traditional TE algorithms, learning-based approaches have been proposed, showing potential of significant efficiency improvement over state-of-the-art methods. Nevertheless, the intrinsic limitations of existing learning-based methods hinder their practical application: they are not generalizable across diverse topologies and network conditions, incur excessive training overhead, and do not respect link capacities by default. This paper proposes TELGEN, a novel TE algorithm that learns to solve TE problems efficiently in large-scale networks, while achieving superior generalizability across diverse network conditions. TELGEN is based on the novel idea of transforming the problem of "predicting the optimal TE solution" into "predicting the optimal TE algorithm", which enables TELGEN to learn and efficiently approximate the end-to-end solving process of classical optimal TE algorithms. The learned algorithm is agnostic to the exact network topology or traffic patterns, and can efficiently solve TE problems given arbitrary inputs and generalize well to unseen topologies and demands. We trained and evaluated TELGEN on random and real-world networks with up to 5000 nodes and 106 links. TELGEN achieved less than 3% optimality gap while ensuring feasibility in all cases, even when the test network had up to 20x more nodes than the largest in training. It also saved up to 84% solving time than classical optimal solver, and could reduce training time per epoch and solving time by 2-4 orders of magnitude than latest learning algorithms on the largest networks.

Traffic Engineering in Large-scale Networks with Generalizable Graph Neural Networks

TL;DR

The paper tackles traffic engineering in large-scale networks by addressing scalability and generalization gaps in existing learning-based TE. It introduces TELGEN, a graph-neural-network framework that learns to imitate the steps of a classical interior-point method (IPM) by transforming TE into a graph-structured LP and aligning the GNN with IPM iterations. TELGEN employs a double-loop architecture with K outer iterations and J inner layers, and is trained with strong supervision using intermediate IPM updates, enabling topology-agnostic generalization. Empirical results show TELGEN achieves a near-optimal TE solution with gaps below 3%, up to 84% solving-time savings over IPM, and substantial speedups over TEAL and HARP, while generalizing to networks up to 20× larger and to unseen demand distributions. These findings position TELGEN as a scalable, generalizable building block for automated, high-performance TE in modern WANs and cloud networks.

Abstract

Traffic engineering (TE) in large-scale computer networks has become a fundamental yet challenging problem, owing to the swift growth of global-scale cloud wide-area networks or backbone low-Earth-orbit satellite constellations. To address the scalability issue of traditional TE algorithms, learning-based approaches have been proposed, showing potential of significant efficiency improvement over state-of-the-art methods. Nevertheless, the intrinsic limitations of existing learning-based methods hinder their practical application: they are not generalizable across diverse topologies and network conditions, incur excessive training overhead, and do not respect link capacities by default. This paper proposes TELGEN, a novel TE algorithm that learns to solve TE problems efficiently in large-scale networks, while achieving superior generalizability across diverse network conditions. TELGEN is based on the novel idea of transforming the problem of "predicting the optimal TE solution" into "predicting the optimal TE algorithm", which enables TELGEN to learn and efficiently approximate the end-to-end solving process of classical optimal TE algorithms. The learned algorithm is agnostic to the exact network topology or traffic patterns, and can efficiently solve TE problems given arbitrary inputs and generalize well to unseen topologies and demands. We trained and evaluated TELGEN on random and real-world networks with up to 5000 nodes and 106 links. TELGEN achieved less than 3% optimality gap while ensuring feasibility in all cases, even when the test network had up to 20x more nodes than the largest in training. It also saved up to 84% solving time than classical optimal solver, and could reduce training time per epoch and solving time by 2-4 orders of magnitude than latest learning algorithms on the largest networks.

Paper Structure

This paper contains 30 sections, 10 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Graph representation of TE instance with one SD pair and two paths. Each vertex denotes a variable, a constraint, or the objective. Directed edges denote their correlations in the LP formulation. GNN message passing will be carried out along the edges in six directions, as detailed in Sec. \ref{['sec:model_design']}.
  • Figure 2: Double-looped GNN architecture: $K$ outer loops simulating IPM iterations, and $J$ inner layers to learn an IPM step.
  • Figure 3: Prediction time for different ER test datasets based on number of nodes and edge probability
  • Figure 4: Prediction time for different WA test datasets based on number of nodes and different number of SD pairs