Optimizing Urban Service Allocation with Time-Constrained Restless Bandits

Yi Mao; Andrew Perrault

Optimizing Urban Service Allocation with Time-Constrained Restless Bandits

Yi Mao, Andrew Perrault

TL;DR

This work tackles the problem of scheduling urban food inspections under strict per-establishment frequency guarantees and a limited per-period budget by extending restless RMABs with action-window constraints. It combines a window-augmented MDP encoding, an integer-programming lookahead, and a window-optimization scheme to enforce constraints while maximizing inspection impact, with a neural network model trained on CDPH data to learn state transitions. Empirically, the approach yields substantial objective gains—up to $24\%$ in synthetic experiments and $33\%$ on real CDPH data—while remaining robust to surprise inspections and various budget regimes. The results demonstrate scalable RMAB planning under service constraints and offer a principled path for deploying constrained scheduling in urban public services.

Abstract

Municipal inspections are an important part of maintaining the quality of goods and services. In this paper, we approach the problem of intelligently scheduling service inspections to maximize their impact, using the case of food establishment inspections in Chicago as a case study. The Chicago Department of Public Health (CDPH) inspects thousands of establishments each year, with a substantial fail rate (over 3,000 failed inspection reports in 2023). To balance the objectives of ensuring adherence to guidelines, minimizing disruption to establishments, and minimizing inspection costs, CDPH assigns each establishment an inspection window every year and guarantees that they will be inspected exactly once during that window. Meanwhile, CDPH also promises surprise public health inspections for unexpected food safety emergencies or complaints. These constraints create a challenge for a restless multi-armed bandit (RMAB) approach, for which there are no existing methods. We develop an extension to Whittle index-based systems for RMABs that can guarantee action window constraints and frequencies, and furthermore can be leveraged to optimize action window assignments themselves. Briefly, we combine MDP reformulation and integer programming-based lookahead to maximize the impact of inspections subject to constraints. A neural network-based supervised learning model is developed to model state transitions of real Chicago establishments using public CDPH inspection records, which demonstrates 10% AUC improvements compared with directly predicting establishments' failures. Our experiments not only show up to 24% (in simulation) or 33% (on real data) objective improvements resulting from our approach and robustness to surprise inspections, but also give insight into the impact of scheduling constraints.

Optimizing Urban Service Allocation with Time-Constrained Restless Bandits

TL;DR

Abstract

Optimizing Urban Service Allocation with Time-Constrained Restless Bandits

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (2)