Design and Scheduling of an AI-based Queueing System
Jiung Lee, Hongseok Namkoong, Yibo Zeng
TL;DR
The paper tackles scheduling in AI-assisted service systems where predictions determine job classes and misclassifications create congestion externalities. It develops a heavy-traffic diffusion framework that yields a lower bound on queueing costs under class uncertainty and proves the P$c\mu$-rule is asymptotically optimal in this setting by balancing predicted-class workloads via a convex allocation problem. The authors also demonstrate an integrated design approach: selecting predictive models by downstream queueing cost, and designing AI-based triage that jointly optimizes filtering and staffing under multiple cost components. Empirical results on real toxicity data show the Pcμ-rule outperforms naive prediction-based policies and DRL across varied cost structures and nonstationarities, and the framework enables practical model selection and triage system tuning without expensive queueing simulations.
Abstract
To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a job is estimated using a prediction model. By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner. Our theoretical results guide the design of predictive models by providing a simple model selection procedure with downstream queueing performance as a central concern, and offer novel insights on how to design queueing systems with AI-based triage. We illustrate our framework on a content moderation task based on real online comments, where we construct toxicity classifiers by finetuning large language models.
