Table of Contents
Fetching ...

Learning payoffs while routing in skill-based queues

Sanne van Kempen, Jaron Sanders, Fiona Sloothaak, Maarten G. Wolf

TL;DR

The paper tackles online payoff maximization in skill-based queues with compatibility constraints and unknown line payoffs. It develops an adaptive routing policy (UCB QR) that leverages the finite set of basic feasible solutions of a static LP to guide exploration-exploitation while maintaining queue stability, analyzed through an episodic, stationary-queueing framework. A regret lower bound $\Omega(\ln t)$ is shown for stable policies, and UCB QR achieves a matching polylogarithmic upper bound $O(\ln^{2\beta} t)$ for any $\beta>1$, establishing asymptotic optimality up to logarithmic factors; the analysis connects queue-length convergence to the learning performance. Numerical experiments including time-varying parameters confirm rapid convergence to the optimum and good robustness, highlighting the method's potential for practical, large-scale service systems.

Abstract

Motivated by applications in service systems, we consider queueing systems where each customer must be handled by a server with the right skill set. We focus on optimizing the routing of customers to servers in order to maximize the total payoff of customer--server matches. In addition, customer--server dependent payoff parameters are assumed to be unknown a priori. We construct a machine learning algorithm that adaptively learns the payoff parameters while maximizing the total payoff and prove that it achieves polylogarithmic regret. Moreover, we show that the algorithm is asymptotically optimal up to logarithmic terms by deriving a regret lower bound. The algorithm leverages the basic feasible solutions of a static linear program as the action space. The regret analysis overcomes the complex interplay between queueing and learning by analyzing the convergence of the queue length process to its stationary behavior. We also demonstrate the performance of the algorithm numerically, and have included an experiment with time-varying parameters highlighting the potential of the algorithm in non-static environments.

Learning payoffs while routing in skill-based queues

TL;DR

The paper tackles online payoff maximization in skill-based queues with compatibility constraints and unknown line payoffs. It develops an adaptive routing policy (UCB QR) that leverages the finite set of basic feasible solutions of a static LP to guide exploration-exploitation while maintaining queue stability, analyzed through an episodic, stationary-queueing framework. A regret lower bound is shown for stable policies, and UCB QR achieves a matching polylogarithmic upper bound for any , establishing asymptotic optimality up to logarithmic factors; the analysis connects queue-length convergence to the learning performance. Numerical experiments including time-varying parameters confirm rapid convergence to the optimum and good robustness, highlighting the method's potential for practical, large-scale service systems.

Abstract

Motivated by applications in service systems, we consider queueing systems where each customer must be handled by a server with the right skill set. We focus on optimizing the routing of customers to servers in order to maximize the total payoff of customer--server matches. In addition, customer--server dependent payoff parameters are assumed to be unknown a priori. We construct a machine learning algorithm that adaptively learns the payoff parameters while maximizing the total payoff and prove that it achieves polylogarithmic regret. Moreover, we show that the algorithm is asymptotically optimal up to logarithmic terms by deriving a regret lower bound. The algorithm leverages the basic feasible solutions of a static linear program as the action space. The regret analysis overcomes the complex interplay between queueing and learning by analyzing the convergence of the queue length process to its stationary behavior. We also demonstrate the performance of the algorithm numerically, and have included an experiment with time-varying parameters highlighting the potential of the algorithm in non-static environments.

Paper Structure

This paper contains 46 sections, 17 theorems, 204 equations, 14 figures, 1 table, 2 algorithms.

Key Result

Lemma 2.1

Let $\varepsilon \geq 0$, $\scrL\subseteq\calL$, and $\scrJ\subseteq\calJ$. $B=\scrL\cup\scrJ$ is a basis of $\mathrm{LP}(\theta,\varepsilon)$ if and only if $\calG(\scrL,\scrJ)$ is a spanning forest of $(\calI\cup\calJ,\calL)$.

Figures (14)

  • Figure 1: A skill-based queueing system with compatibility lines and customer--server dependent payoffs. Here, the $\lambda_i$ denote arrival rates, the $\mu_j$ denote service rates, and the $\theta_{ij}$ indicate the average payoff generated upon service completion of a type-$i$ customer at server $j$.
  • Figure 2: Schematic overview of Algorithm \ref{['alg:learning_alg']}.
  • Figure 3: A queueing model with $I=3$ types of customers and $J = 3$ servers with compatibility lines. Here, e.g., $\calS_1 = \{1,2,3\}$ and $\calC_1 = \{1,3\}$.
  • Figure 4: Consider the queueing system in Figure \ref{['fig:qsys_example_3']} and let $\scrL = \{(11),(13),(23),(33)\}$ and $\scrJ = \{1,2\}$. $\calG(\scrL,\scrJ)$ is then a spanning forest of $(\calI\cup\calJ,\calL)$ consisting of two trees.
  • Figure 5: Possible realization of the queue length process of the virtual queue of server $j\in\calJ$ under Algorithm \ref{['alg:learning_alg']} for two episodes. Episode $k$ consists of a warmup time $\tau_k$ followed by a period of fixed length $H_0$. Every episode, the algorithm can choose a different action with its own stationary measure.
  • ...and 9 more figures

Theorems & Definitions (32)

  • Lemma 2.1
  • Lemma 2.2
  • Lemma 2.3
  • Lemma 2.4
  • Lemma 3.1
  • Lemma 3.2
  • Theorem 1
  • Theorem 2
  • Lemma 4.1
  • Lemma 4.2
  • ...and 22 more