Table of Contents
Fetching ...

Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse

Michael Doherty, Alejandra Beghelli

TL;DR

This work tackles RWA-LR in fixed-grid optical networks with flex-rate transponders and incremental loading by deploying a graph attention network (GAT) based PPO reinforcement learning agent. It leverages the GPU-accelerated XLRON simulator to train on a large-scale dataset (hundreds of millions of samples) and benchmarks against optimized heuristics (KSP-FF and FF-KSP) under different path-ordering criteria. The key contributions are a thorough benchmarking of path ordering (hops vs. length), a new GAT-based RL methodology achieving competitive gains (notably around 2–3% in mean throughput over prior RL and heuristics) and the public release of the training framework for reproducibility. The results underscore the difficulty of improving long-horizon resource allocation policies with RL, while providing practical guidance on benchmarking and the potential value of RL for alternative metrics and future network scenarios.

Abstract

Many works have investigated reinforcement learning (RL) for routing and spectrum assignment on flex-grid networks but only one work to date has examined RL for fixed-grid with flex-rate transponders, despite production systems using this paradigm. Flex-rate transponders allow existing lightpaths to accommodate new services, a task we term routing and wavelength assignment with lightpath reuse (RWA-LR). We re-examine this problem and present a thorough benchmarking of heuristic algorithms for RWA-LR, which are shown to have 6% increased throughput when candidate paths are ordered by number of hops, rather than total length. We train an RL agent for RWA-LR with graph attention networks for the policy and value functions to exploit the graph-structured data. We provide details of our methodology and open source all of our code for reproduction. We outperform the previous state-of-the-art RL approach by 2.5% (17.4 Tbps mean additional throughput) and the best heuristic by 1.2% (8.5 Tbps mean additional throughput). This marginal gain highlights the difficulty in learning effective RL policies on long horizon resource allocation tasks.

Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse

TL;DR

This work tackles RWA-LR in fixed-grid optical networks with flex-rate transponders and incremental loading by deploying a graph attention network (GAT) based PPO reinforcement learning agent. It leverages the GPU-accelerated XLRON simulator to train on a large-scale dataset (hundreds of millions of samples) and benchmarks against optimized heuristics (KSP-FF and FF-KSP) under different path-ordering criteria. The key contributions are a thorough benchmarking of path ordering (hops vs. length), a new GAT-based RL methodology achieving competitive gains (notably around 2–3% in mean throughput over prior RL and heuristics) and the public release of the training framework for reproducibility. The results underscore the difficulty of improving long-horizon resource allocation policies with RL, while providing practical guidance on benchmarking and the potential value of RL for alternative metrics and future network scenarios.

Abstract

Many works have investigated reinforcement learning (RL) for routing and spectrum assignment on flex-grid networks but only one work to date has examined RL for fixed-grid with flex-rate transponders, despite production systems using this paradigm. Flex-rate transponders allow existing lightpaths to accommodate new services, a task we term routing and wavelength assignment with lightpath reuse (RWA-LR). We re-examine this problem and present a thorough benchmarking of heuristic algorithms for RWA-LR, which are shown to have 6% increased throughput when candidate paths are ordered by number of hops, rather than total length. We train an RL agent for RWA-LR with graph attention networks for the policy and value functions to exploit the graph-structured data. We provide details of our methodology and open source all of our code for reproduction. We outperform the previous state-of-the-art RL approach by 2.5% (17.4 Tbps mean additional throughput) and the best heuristic by 1.2% (8.5 Tbps mean additional throughput). This marginal gain highlights the difficulty in learning effective RL policies on long horizon resource allocation tasks.

Paper Structure

This paper contains 7 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Outline of our RL training and optical network simulation framework, XLRON. The hierarchy of Device$_{LEARN}$, Learner, Device$_{ENV}$ and Environment computational abstractions are shown left. Right shows details of the training loop for a single Learner (set of neural network parameters) acting over parallel environments. The topology shown is illustrative; we use the NSFNET topology for our studies.
  • Figure 2: Training of our agent, compared with published RL results from Nevin et al. and our strongest heuristic benchmark (5-SP-FF with paths ordered by hops). Shaded areas indicate standard deviations. We used 100 parallel environments, with mean and standard deviation of accepted services at end of each episode calculated across environments.
  • Figure 3: Each heuristic and RL solution was evaluated on 100 episodes fo 10,000 service requests. Boxplots show the mean, median, standard deviation and 1.5*interquartile range for accepted services from those episodes. #hops indicates heuristics with candidate paths ordered by number of hops.
  • Figure 4: Comparison of services accepted by XLRON vs. KSP-FF$_{hops}$ for the same evaluation episodes (same sequences of service requests). Green bars indicate additional services accepted by XLRON, red indicates more services accepted by KSP-FF$_{hops}$.