Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry

A. L. García Navarro; Nataliia Koneva; Alfonso Sánchez-Macián; José Alberto Hernández; Óscar González de Dios; J. M. Rivas-Moscoso

Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry

A. L. García Navarro, Nataliia Koneva, Alfonso Sánchez-Macián, José Alberto Hernández, Óscar González de Dios, J. M. Rivas-Moscoso

TL;DR

The paper addresses routing in packet-optical networks under changing optical quality and congestion by leveraging telemetry from the physical and link layers. It adopts a reinforcement-learning approach, specifically model-free Q-learning, with rewards crafted from propagation delay, link load, and pre-FEC BER to guide routing decisions, as formalized by the update $Q(s,a) \leftarrow Q(s,a) + \alpha [ R(s,a) + \gamma \max_{a'} Q(s',a') - Q(s,a) ]$. An open-source implementation is provided and demonstrated on an 8-node topology and a 23-node Tokyo MAN, showing that the policy can dynamically adapt to link degradation by switching to primary or secondary routes. The work contributes a practical reward-generation methodology that integrates optical and packet telemetry and a scalable, zero-touch routing approach suitable for deployment with Path Computation Elements (PCE).

Abstract

This article provides a methodology and open-source implementation of Reinforcement Learning algorithms for finding optimal routes in a packet-optical network scenario. The algorithm uses measurements provided by the physical layer (pre-FEC bit error rate and propagation delay) and the link layer (link load) to configure a set of latency-based rewards and penalties based on such measurements. Then, the algorithm executes Q-learning based on this set of rewards for finding the optimal routing strategies. It is further shown that the algorithm dynamically adapts to changing network conditions by re-calculating optimal policies upon either link load changes or link degradation as measured by pre-FEC BER.

Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry

TL;DR

. An open-source implementation is provided and demonstrated on an 8-node topology and a 23-node Tokyo MAN, showing that the policy can dynamically adapt to link degradation by switching to primary or secondary routes. The work contributes a practical reward-generation methodology that integrates optical and packet telemetry and a scalable, zero-touch routing approach suitable for deployment with Path Computation Elements (PCE).

Abstract

Paper Structure (6 sections, 1 equation, 2 figures, 2 tables)

This paper contains 6 sections, 1 equation, 2 figures, 2 tables.

Introduction
Background and methodology
Simulation scenario and RL-based solution
Example on a small network topology
Extended example on a large topology: Tokyo MAN
Summary and discussion

Figures (2)

Figure 1: 8-node topology example
Figure 2: Tokyo topology example

Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry

TL;DR

Abstract

Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry

Authors

TL;DR

Abstract

Table of Contents

Figures (2)