On Joint Convergence of Traffic State and Weight Vector in Learning-Based Dynamic Routing with Value Function Approximation

Yidan Wu; Jianan Zhang; Li Jin

On Joint Convergence of Traffic State and Weight Vector in Learning-Based Dynamic Routing with Value Function Approximation

Yidan Wu, Jianan Zhang, Li Jin

TL;DR

This work addresses the stability and convergence of learning-based dynamic routing over parallel servers using a semi-gradient SARSA algorithm with a linear value-function approximator. It introduces a Lyapunov-based drift framework to guarantee mean-bounded traffic states and proves a joint convergence result: the weight vector and the traffic state converge if and only if the system is stabilizable, leveraging stochastic approximation theory and an on-policy WSQ structure. Empirical results show fast convergence and a small optimality gap compared to neural-network baselines, while maintaining computational efficiency. The method yields interpretable, idling-capable routing policies with strong theoretical guarantees, applicable to unknown and potentially unbounded state spaces.

Abstract

Learning-based approaches are increasingly popular for traffic control problems. However, these approaches are applied typically as black boxes with limited theoretical guarantees and interpretability. In this paper, we consider the theory of dynamic routing over parallel servers, a representative traffic control task, using semi-gradient on-policy control algorithm, a representative reinforcement learning method. We consider a linear value function approximation on an infinite state space; a Lyapunov function is also derived from the approximator. In particular, the structure of the approximator naturally makes possible idling policies, which is an interesting and useful advantage over existing dynamic routing schemes. We show that the convergence of the approximation weights is coupled with the convergence of the traffic state. We show that if the system is stabilizable, then (i) the weight vector converges to a bounded region, and (ii) the traffic state is bounded in the mean. We also empirically show that the proposed algorithm is computationally efficient with an insignificant optimality gap.

On Joint Convergence of Traffic State and Weight Vector in Learning-Based Dynamic Routing with Value Function Approximation

TL;DR

Abstract

Paper Structure (11 sections, 5 theorems, 57 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 5 theorems, 57 equations, 2 figures, 2 tables, 1 algorithm.

Introduction
Modeling and Formulation
System modeling
MDP formulation
Semi-gradient SARSA algorithm
Joint convergence guarantee
Convergence of $x[k]$
Unboundedness of $\sum_{k=0}^\infty\alpha_k$
Convergence of w[k]
experiments
conclusion

Key Result

Theorem 1

(Joint convergence) Consider a stabilizable parallel service system with arrival rate $\lambda>0$ and service rates $\mu_1,\mu_2,\ldots,\mu_N>0.$ Suppose that the step size condition eq_sumk=0 holds. Then, the traffic state $x[k]$ converges in the sense of equa: stable state and the weight $w[k]$ co

Figures (2)

Figure 1: A parallel service system.
Figure 2: The performance compare between SGS and NN.

Theorems & Definitions (5)

Theorem 1
Lemma 1
Proposition 1
Proposition 2
Proposition 3

On Joint Convergence of Traffic State and Weight Vector in Learning-Based Dynamic Routing with Value Function Approximation

TL;DR

Abstract

On Joint Convergence of Traffic State and Weight Vector in Learning-Based Dynamic Routing with Value Function Approximation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (5)