Learning-Based Adaptive Dynamic Routing with Stability Guarantee for a Single-Origin-Single-Destination Network

Yidan Wu; Feixiang Shu; Jianan Zhang; Li Jin

Learning-Based Adaptive Dynamic Routing with Stability Guarantee for a Single-Origin-Single-Destination Network

Yidan Wu, Feixiang Shu, Jianan Zhang, Li Jin

TL;DR

This work tackles dynamic routing in a single-origin-single-destination queuing network by marrying stability guarantees with learning-based routing. It introduces a generalized shortest-path (GSP) policy whose path costs are given by piecewise-linear functions $Q_p(x)$, and proves throughput stability via Foster-Lyapunov analysis, with parameter constraints linking $eta$, $ abla$, and the network’s $m$. A policy iteration (PI) algorithm leverages the PL $Q_p(x)$ as proxies for both the Lyapunov-based stability analysis and the action-value function, enabling learning of $(eta, abla)$ while preserving stability. The method achieves near-optimal routing with a substantial reduction in training time compared to neural-network baselines, as demonstrated on a bridge SOSD network, and shows favorable trade-offs between computational and implementation efficiency. These results advance scalable, provably stable, learning-based dynamic routing for complex networks.

Abstract

We consider learning-based adaptive dynamic routing for a single-origin-single-destination queuing network with stability guarantees. Specifically, we study a class of generalized shortest path policies that can be parameterized by only two constants via a piecewise-linear function. Using the Foster-Lyapunov stability theory, we develop a criterion on the parameters to ensure mean boundedness of the traffic state. Then, we develop a policy iteration algorithm that learns the parameters from realized sample paths. Importantly, the piecewise-linear function is both integrated into the Lyapunov function for stability analysis and used as a proxy of the value function for policy iteration; hence, stability is inherently ensured for the learned policy. Finally, we demonstrate via a numerical example that the proposed algorithm learns a near-optimal routing policy with an acceptable optimality gap but significantly higher computational efficiency compared with a standard neural network-based algorithm.

Learning-Based Adaptive Dynamic Routing with Stability Guarantee for a Single-Origin-Single-Destination Network

TL;DR

, and proves throughput stability via Foster-Lyapunov analysis, with parameter constraints linking

, and the network’s

. A policy iteration (PI) algorithm leverages the PL

as proxies for both the Lyapunov-based stability analysis and the action-value function, enabling learning of

while preserving stability. The method achieves near-optimal routing with a substantial reduction in training time compared to neural-network baselines, as demonstrated on a bridge SOSD network, and shows favorable trade-offs between computational and implementation efficiency. These results advance scalable, provably stable, learning-based dynamic routing for complex networks.

Abstract

Paper Structure (14 sections, 1 theorem, 31 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 1 theorem, 31 equations, 2 figures, 1 table, 1 algorithm.

Introduction
Modeling and formulation
Stability guarantee for GSP policy
Iterative computation of GSP parameters
GSP PI algorithm
Estimate of $\bar{\lambda},\bar{\mu}_n$
Update of $\beta$
Update of $\gamma$
Benchmark policies and algorithms
Neural network (NN) PI
Simple shortest-path (SSP) policy
Optimal Bernoulli (OB) policy
Evaluation and comparison
Concluding remarks

Key Result

Theorem 1

The bridge network is stable under the GSP policy if the network is stabilizable and

Figures (2)

Figure 1: A single-origin-single-destination network.
Figure 2: Average system time comparison of GSP PI and NN algorithm. The coordinate scale of the axes is uneven due to the large value. (a) Average system time as the learning time growing. (b) Average system time as the number of iterations increases.

Theorems & Definitions (2)

Definition 1: Bottleneck
Theorem 1

Learning-Based Adaptive Dynamic Routing with Stability Guarantee for a Single-Origin-Single-Destination Network

TL;DR

Abstract

Learning-Based Adaptive Dynamic Routing with Stability Guarantee for a Single-Origin-Single-Destination Network

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (2)