Learning-Based Adaptive Dynamic Routing with Stability Guarantee for a Single-Origin-Single-Destination Network
Yidan Wu, Feixiang Shu, Jianan Zhang, Li Jin
TL;DR
This work tackles dynamic routing in a single-origin-single-destination queuing network by marrying stability guarantees with learning-based routing. It introduces a generalized shortest-path (GSP) policy whose path costs are given by piecewise-linear functions $Q_p(x)$, and proves throughput stability via Foster-Lyapunov analysis, with parameter constraints linking $eta$, $ abla$, and the network’s $m$. A policy iteration (PI) algorithm leverages the PL $Q_p(x)$ as proxies for both the Lyapunov-based stability analysis and the action-value function, enabling learning of $(eta, abla)$ while preserving stability. The method achieves near-optimal routing with a substantial reduction in training time compared to neural-network baselines, as demonstrated on a bridge SOSD network, and shows favorable trade-offs between computational and implementation efficiency. These results advance scalable, provably stable, learning-based dynamic routing for complex networks.
Abstract
We consider learning-based adaptive dynamic routing for a single-origin-single-destination queuing network with stability guarantees. Specifically, we study a class of generalized shortest path policies that can be parameterized by only two constants via a piecewise-linear function. Using the Foster-Lyapunov stability theory, we develop a criterion on the parameters to ensure mean boundedness of the traffic state. Then, we develop a policy iteration algorithm that learns the parameters from realized sample paths. Importantly, the piecewise-linear function is both integrated into the Lyapunov function for stability analysis and used as a proxy of the value function for policy iteration; hence, stability is inherently ensured for the learned policy. Finally, we demonstrate via a numerical example that the proposed algorithm learns a near-optimal routing policy with an acceptable optimality gap but significantly higher computational efficiency compared with a standard neural network-based algorithm.
