Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

Neharika Jali; Guannan Qu; Weina Wang; Gauri Joshi

Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

Neharika Jali, Guannan Qu, Weina Wang, Gauri Joshi

TL;DR

The paper tackles routing of jobs in a central queueing system with $k$ heterogeneous servers, aiming to minimize long-run average response time. It introduces ACHQ, a low-dimensional, policy-gradient actor-critic method that exploits the queueing structure via a soft-threshold parameterization, enabling scalable learning in large state spaces. Theoretical results guarantee convergence to a stationary point for general $k$ and establish approximate global optimality for the two-server case, while experiments show ACHQ achieving up to ~30% reduction in expected response time compared to greedy routing to the fastest server. The work highlights the practical potential of structure-aware reinforcement learning for complex queueing control problems and sets the stage for future proofs of threshold-optimality in multi-server settings.

Abstract

We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-trivial to find. While Reinforcement Learning (RL) has been recognized to have great potential for learning policies in such cases, our problem has an exponentially large state space size, rendering standard RL inefficient. In this work, we propose ACHQ, an efficient policy gradient based algorithm with a low dimensional soft threshold policy parameterization that leverages the underlying queueing structure. We provide stationary-point convergence guarantees for the general case and despite the low-dimensional parameterization prove that ACHQ converges to an approximate global optimum for the special case of two servers. Simulations demonstrate an improvement in expected response time of up to ~30% over the greedy policy that routes to the fastest available server.

Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

TL;DR

The paper tackles routing of jobs in a central queueing system with

heterogeneous servers, aiming to minimize long-run average response time. It introduces ACHQ, a low-dimensional, policy-gradient actor-critic method that exploits the queueing structure via a soft-threshold parameterization, enabling scalable learning in large state spaces. Theoretical results guarantee convergence to a stationary point for general

and establish approximate global optimality for the two-server case, while experiments show ACHQ achieving up to ~30% reduction in expected response time compared to greedy routing to the fastest server. The work highlights the practical potential of structure-aware reinforcement learning for complex queueing control problems and sets the stage for future proofs of threshold-optimality in multi-server settings.

Abstract

Paper Structure (36 sections, 2 theorems, 34 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 36 sections, 2 theorems, 34 equations, 4 figures, 1 table, 2 algorithms.

INTRODUCTION
Main Contributions.
RELATED WORK
PROBLEM FORMULATION
Queueing Setup
Model.
Performance Metric.
Slow Now versus Fast Later.
Threshold Policy.
Queue as a Markov Decision Process
Relative Value Iteration: The Curse of Dimensionality
ACTOR CRITIC FOR HETEROGENEOUS QUEUES
Preliminaries
Actor.
Critic.
...and 21 more sections

Key Result

Theorem 5.1

Under assumptions 5.1-5.5, choosing the actor step size $\alpha^{(t)} = \mathcal{O}(1/(1+t)^{r_\alpha})$ and the critic step size $\beta^{(t)} = \mathcal{O}(1/(1+t)^{r_\beta})$, where $0 < r_\beta < r_\alpha < 1$, we have algo:PG can find an $\epsilon$-approximate stationary point of $J(\cdot)$ within $\tau$ steps as where $r_{\alpha} = 3/5, r_{\beta} = 2/5$ and the total number of iterations $\

Figures (4)

Figure 1: $k$ Server heterogeneous queueing system with service rates $\mu_i$, and job arrival rate $\lambda$. We also illustrate a threshold routing policy that routes jobs to a slower server $i$ when the queue length is $> \theta_i$.
Figure 2: For multi-server heterogeneous systems, optimal policy is observed to be of threshold type where jobs are routed to the fastest available server only when the queue length exceeds the threshold
Figure 3: ACHQ shows up to $\sim 30\%$ improvement over the FAS and RSRT baselines
Figure 4: Convergence of ACHQ: Instance of $8$ servers with $\bm{\mu} = \hbox{linspace}(100, 1)$ and load $\rho = 0.4$

Theorems & Definitions (2)

Theorem 5.1: Wu2022A2CPG, Corollary 4.9
Theorem 5.2: BhandariRussoGlobal, Theorem 5

Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

TL;DR

Abstract

Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)