The Transient Cost of Learning in Queueing Systems

Daniel Freund; Thodoris Lykouris; Wentao Weng

The Transient Cost of Learning in Queueing Systems

Daniel Freund, Thodoris Lykouris, Wentao Weng

TL;DR

This paper introduces the Transient Cost of Learning in Queueing (TCLQ), a finite-time metric that captures how parameter uncertainty increases time-averaged queue lengths. It provides a tight TCLQ lower bound for single-queue systems and shows that a UCB-based policy achieves near-optimal TCLQ, highlighting a learning-induced transient cost that scales with the number of servers and the slack $\varepsilon$. The framework is extended to multi-queue and networked queueing through MaxWeight-UCB and BackPressure-UCB, achieving near-optimal $\tilde{O}(1/\varepsilon)$ TCLQ and demonstrating strong transient performance across complex queueing networks. Combined with a Lyapunov–bandit analysis, these results give practical, scalable guidance for learning in stochastic service systems and offer pathways to handle nonstationarity and richer contexts in future work.

Abstract

Queueing systems are widely applicable stochastic models with use cases in communication networks, healthcare, service systems, etc. Although their optimal control has been extensively studied, most existing approaches assume perfect knowledge of the system parameters. This assumption rarely holds in practice where there is parameter uncertainty, thus motivating a recent line of work on bandit learning for queueing systems. This nascent stream of research focuses on the asymptotic performance of the proposed algorithms but does not provide insight on the transient performance in the early stages of the learning process. In this paper, we propose the Transient Cost of Learning in Queueing (TCLQ), a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty. We characterize the TCLQ of a single-queue multi-server system, and then extend these results to multi-queue multi-server systems and networks of queues. In establishing our results, we propose a unified analysis framework for TCLQ that bridges Lyapunov and bandit analysis, provides guarantees for a wide range of algorithms, and could be of independent interest.

The Transient Cost of Learning in Queueing Systems

TL;DR

. The framework is extended to multi-queue and networked queueing through MaxWeight-UCB and BackPressure-UCB, achieving near-optimal

TCLQ and demonstrating strong transient performance across complex queueing networks. Combined with a Lyapunov–bandit analysis, these results give practical, scalable guidance for learning in stochastic service systems and offer pathways to handle nonstationarity and richer contexts in future work.

Abstract

Paper Structure (61 sections, 45 theorems, 167 equations, 6 figures, 1 table, 4 algorithms)

This paper contains 61 sections, 45 theorems, 167 equations, 6 figures, 1 table, 4 algorithms.

Introduction
Our contribution
Transient Cost of learning in queueing.
Lower bound of $\textsc{TCLQ}$ (Theorem \ref{['thm:colq-lowerbound']}).
An efficient algorithm for single-queue multi-server systems (Theorem \ref{['thm:colq-ucb-single']}).
Efficient algorithms for multi-queue systems and queueing networks (Theorems \ref{['thm:mw-ucb']},\ref{['thm:bp-ucb-col']}).
Related work
Warm-up model: single-queue multi-Server systems
Model dynamics
Extensions to general queueing networks.
Objective: The Transient Cost of Learning in Queueing
Optimal transient cost of learning for our warm-up model
Queue length bound in the learning stage (Lemma \ref{['lem:single-queue-learn']})
Queue length bound in the regenerate stage (Lemma \ref{['lem:single-queue-regen']})
Satisficing regret of $\textsc{UCB}$
...and 46 more sections

Key Result

Theorem 1

For any $K \geq 2^{14},\varepsilon \in (0,0.25]$ and feasible policy $\pi$, $\textsc{TCLQ}^{\mathrm{single}}(K,\varepsilon,\pi) \geq \frac{K}{2^{14}\varepsilon}$.

Figures (6)

Figure 1: Expected per-period and time-averaged queue lengths of UCB and Q-UCB KrishnasamySJS21 in a single-queue setting with $K=5,\lambda=0.45,\boldsymbol{\mu} = (0.045,0.35,0.35,0.35,0.55)$; results are averaged over 30 runs. The difference between both algorithms' queue lengths is indistinguishable asymptotically (left figure) though they clearly differ in their learning efficiency for early periods as illustrated by Transient Cost of Learning in Queueing, the metric that our work introduces (right figure).
Figure 2: Flowchart for a single-queue multi-server system.
Figure 3: Comparison between UCB, Q-UCB, and StahlbuhkSM21 algorithm in the setting of Figure \ref{['fig:col-motivation']}
Figure 4: Scaling with respect to $\varepsilon$
Figure 5: Scaling with respect to $K$
...and 1 more figures

Theorems & Definitions (104)

Definition 1
Remark 1
Remark 2
Remark 3
Theorem 1
Remark 4
Theorem 2
Lemma 3.1
Lemma 3.2
Lemma 3.3
...and 94 more

The Transient Cost of Learning in Queueing Systems

TL;DR

Abstract

The Transient Cost of Learning in Queueing Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (104)