The Transient Cost of Learning in Queueing Systems
Daniel Freund, Thodoris Lykouris, Wentao Weng
TL;DR
This paper introduces the Transient Cost of Learning in Queueing (TCLQ), a finite-time metric that captures how parameter uncertainty increases time-averaged queue lengths. It provides a tight TCLQ lower bound for single-queue systems and shows that a UCB-based policy achieves near-optimal TCLQ, highlighting a learning-induced transient cost that scales with the number of servers and the slack $\varepsilon$. The framework is extended to multi-queue and networked queueing through MaxWeight-UCB and BackPressure-UCB, achieving near-optimal $\tilde{O}(1/\varepsilon)$ TCLQ and demonstrating strong transient performance across complex queueing networks. Combined with a Lyapunov–bandit analysis, these results give practical, scalable guidance for learning in stochastic service systems and offer pathways to handle nonstationarity and richer contexts in future work.
Abstract
Queueing systems are widely applicable stochastic models with use cases in communication networks, healthcare, service systems, etc. Although their optimal control has been extensively studied, most existing approaches assume perfect knowledge of the system parameters. This assumption rarely holds in practice where there is parameter uncertainty, thus motivating a recent line of work on bandit learning for queueing systems. This nascent stream of research focuses on the asymptotic performance of the proposed algorithms but does not provide insight on the transient performance in the early stages of the learning process. In this paper, we propose the Transient Cost of Learning in Queueing (TCLQ), a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty. We characterize the TCLQ of a single-queue multi-server system, and then extend these results to multi-queue multi-server systems and networks of queues. In establishing our results, we propose a unified analysis framework for TCLQ that bridges Lyapunov and bandit analysis, provides guarantees for a wide range of algorithms, and could be of independent interest.
