Design and Scheduling of an AI-based Queueing System

Jiung Lee; Hongseok Namkoong; Yibo Zeng

Design and Scheduling of an AI-based Queueing System

Jiung Lee, Hongseok Namkoong, Yibo Zeng

TL;DR

The paper tackles scheduling in AI-assisted service systems where predictions determine job classes and misclassifications create congestion externalities. It develops a heavy-traffic diffusion framework that yields a lower bound on queueing costs under class uncertainty and proves the P$c\mu$-rule is asymptotically optimal in this setting by balancing predicted-class workloads via a convex allocation problem. The authors also demonstrate an integrated design approach: selecting predictive models by downstream queueing cost, and designing AI-based triage that jointly optimizes filtering and staffing under multiple cost components. Empirical results on real toxicity data show the Pcμ-rule outperforms naive prediction-based policies and DRL across varied cost structures and nonstationarities, and the framework enables practical model selection and triage system tuning without expensive queueing simulations.

Abstract

To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a job is estimated using a prediction model. By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner. Our theoretical results guide the design of predictive models by providing a simple model selection procedure with downstream queueing performance as a central concern, and offer novel insights on how to design queueing systems with AI-based triage. We illustrate our framework on a content moderation task based on real online comments, where we construct toxicity classifiers by finetuning large language models.

Design and Scheduling of an AI-based Queueing System

TL;DR

-rule is asymptotically optimal in this setting by balancing predicted-class workloads via a convex allocation problem. The authors also demonstrate an integrated design approach: selecting predictive models by downstream queueing cost, and designing AI-based triage that jointly optimizes filtering and staffing under multiple cost components. Empirical results on real toxicity data show the Pcμ-rule outperforms naive prediction-based policies and DRL across varied cost structures and nonstationarities, and the framework enables practical model selection and triage system tuning without expensive queueing simulations.

Abstract

Paper Structure (100 sections, 32 theorems, 149 equations, 10 figures, 4 tables)

This paper contains 100 sections, 32 theorems, 149 equations, 10 figures, 4 tables.

Introduction
Index policies
Theoretical contributions
Integrated design of AI models and queueing systems
Model
Indices and classes.
Data-generating process.
True vs. predicted quantities.
Heavy-traffic scaling and diffusion notation.
Heavy traffic condition.
Skorokhod space and weak convergence.
Discussion on Model Validity and Limitations
Model Validity
Model Limitations
Lower bound on queueing cost
...and 85 more sections

Key Result

Lemma 1

Suppose that Assumptions assumption: data generating process, assumption: heavy traffic, and assumption: second order moments hold. Then, there exist Brownian motions $(\bm{U}_0, \bm{\widetilde{Z}}, \bm{\widetilde{R}}, \bm{V}_0)$ such that

Figures (10)

Figure 1: Schematic of a content moderation system as a triage system. Each content may be violating the user agreement (red toxicity symbol) or considered safe (green checkmark). This ground truth requires human review to uncover ("service"). Contents are flagged for review by users or automated filters, which we view as "entering" the triage system. The online platform uses an initial AI model to filter out contents most deemed to be safe. Then, remaining jobs/contents are randomly assigned to the human reviewers, a common practice due to fairness considerations in terms of mental workload. An AI model classifies each content into different classes (e.g., hate speech on a protected group), placing them in the corresponding virtual queue for the predicted class.
Figure 2: Histogram of average cumulative queueing cost of deep Q-learning policies over 672 hyperparameter configurations.
Figure 3: Cumulative cost with 2$\times$ standard errors
Figure 4: Optimal predicted-class workloads in the two-class example under different misclassification patterns.
Figure 5: We present the cumulative cost for different policies under different testing environments (with 2$\times$ the standard error encapsulated in the orange bracket).
...and 5 more figures

Theorems & Definitions (52)

Definition 1: Feasible Policies
Lemma 1: Joint weak convergence
Proposition 1: Fundamental Convergence Results
Theorem 1: Heavy-traffic lower bound
Lemma 2: KKT conditions
Lemma 3: Convergence of $\bm{J}^n_\text{$\pi_n$}(\cdot; Q^n)$
Definition 2: P$c\mu$-rule
Theorem 2: Optimality of P$c\mu$-rule
Proposition 2: Cumulative Cost Rate of the P$c\mu$-rule
Definition 3: Total cost of the AI-based triage system
...and 42 more

Design and Scheduling of an AI-based Queueing System

TL;DR

Abstract

Design and Scheduling of an AI-based Queueing System

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (52)