Table of Contents
Fetching ...

Learning-Based Adaptive Dynamic Routing with Stability Guarantee for a Single-Origin-Single-Destination Network

Yidan Wu, Feixiang Shu, Jianan Zhang, Li Jin

TL;DR

This work tackles dynamic routing in a single-origin-single-destination queuing network by marrying stability guarantees with learning-based routing. It introduces a generalized shortest-path (GSP) policy whose path costs are given by piecewise-linear functions $Q_p(x)$, and proves throughput stability via Foster-Lyapunov analysis, with parameter constraints linking $eta$, $ abla$, and the network’s $m$. A policy iteration (PI) algorithm leverages the PL $Q_p(x)$ as proxies for both the Lyapunov-based stability analysis and the action-value function, enabling learning of $(eta, abla)$ while preserving stability. The method achieves near-optimal routing with a substantial reduction in training time compared to neural-network baselines, as demonstrated on a bridge SOSD network, and shows favorable trade-offs between computational and implementation efficiency. These results advance scalable, provably stable, learning-based dynamic routing for complex networks.

Abstract

We consider learning-based adaptive dynamic routing for a single-origin-single-destination queuing network with stability guarantees. Specifically, we study a class of generalized shortest path policies that can be parameterized by only two constants via a piecewise-linear function. Using the Foster-Lyapunov stability theory, we develop a criterion on the parameters to ensure mean boundedness of the traffic state. Then, we develop a policy iteration algorithm that learns the parameters from realized sample paths. Importantly, the piecewise-linear function is both integrated into the Lyapunov function for stability analysis and used as a proxy of the value function for policy iteration; hence, stability is inherently ensured for the learned policy. Finally, we demonstrate via a numerical example that the proposed algorithm learns a near-optimal routing policy with an acceptable optimality gap but significantly higher computational efficiency compared with a standard neural network-based algorithm.

Learning-Based Adaptive Dynamic Routing with Stability Guarantee for a Single-Origin-Single-Destination Network

TL;DR

This work tackles dynamic routing in a single-origin-single-destination queuing network by marrying stability guarantees with learning-based routing. It introduces a generalized shortest-path (GSP) policy whose path costs are given by piecewise-linear functions , and proves throughput stability via Foster-Lyapunov analysis, with parameter constraints linking , , and the network’s . A policy iteration (PI) algorithm leverages the PL as proxies for both the Lyapunov-based stability analysis and the action-value function, enabling learning of while preserving stability. The method achieves near-optimal routing with a substantial reduction in training time compared to neural-network baselines, as demonstrated on a bridge SOSD network, and shows favorable trade-offs between computational and implementation efficiency. These results advance scalable, provably stable, learning-based dynamic routing for complex networks.

Abstract

We consider learning-based adaptive dynamic routing for a single-origin-single-destination queuing network with stability guarantees. Specifically, we study a class of generalized shortest path policies that can be parameterized by only two constants via a piecewise-linear function. Using the Foster-Lyapunov stability theory, we develop a criterion on the parameters to ensure mean boundedness of the traffic state. Then, we develop a policy iteration algorithm that learns the parameters from realized sample paths. Importantly, the piecewise-linear function is both integrated into the Lyapunov function for stability analysis and used as a proxy of the value function for policy iteration; hence, stability is inherently ensured for the learned policy. Finally, we demonstrate via a numerical example that the proposed algorithm learns a near-optimal routing policy with an acceptable optimality gap but significantly higher computational efficiency compared with a standard neural network-based algorithm.
Paper Structure (14 sections, 1 theorem, 31 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 1 theorem, 31 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

The bridge network is stable under the GSP policy if the network is stabilizable and

Figures (2)

  • Figure 1: A single-origin-single-destination network.
  • Figure 2: Average system time comparison of GSP PI and NN algorithm. The coordinate scale of the axes is uneven due to the large value. (a) Average system time as the learning time growing. (b) Average system time as the number of iterations increases.

Theorems & Definitions (2)

  • Definition 1: Bottleneck
  • Theorem 1