Table of Contents
Fetching ...

Optimizing Urban Service Allocation with Time-Constrained Restless Bandits

Yi Mao, Andrew Perrault

TL;DR

This work tackles the problem of scheduling urban food inspections under strict per-establishment frequency guarantees and a limited per-period budget by extending restless RMABs with action-window constraints. It combines a window-augmented MDP encoding, an integer-programming lookahead, and a window-optimization scheme to enforce constraints while maximizing inspection impact, with a neural network model trained on CDPH data to learn state transitions. Empirically, the approach yields substantial objective gains—up to $24\%$ in synthetic experiments and $33\%$ on real CDPH data—while remaining robust to surprise inspections and various budget regimes. The results demonstrate scalable RMAB planning under service constraints and offer a principled path for deploying constrained scheduling in urban public services.

Abstract

Municipal inspections are an important part of maintaining the quality of goods and services. In this paper, we approach the problem of intelligently scheduling service inspections to maximize their impact, using the case of food establishment inspections in Chicago as a case study. The Chicago Department of Public Health (CDPH) inspects thousands of establishments each year, with a substantial fail rate (over 3,000 failed inspection reports in 2023). To balance the objectives of ensuring adherence to guidelines, minimizing disruption to establishments, and minimizing inspection costs, CDPH assigns each establishment an inspection window every year and guarantees that they will be inspected exactly once during that window. Meanwhile, CDPH also promises surprise public health inspections for unexpected food safety emergencies or complaints. These constraints create a challenge for a restless multi-armed bandit (RMAB) approach, for which there are no existing methods. We develop an extension to Whittle index-based systems for RMABs that can guarantee action window constraints and frequencies, and furthermore can be leveraged to optimize action window assignments themselves. Briefly, we combine MDP reformulation and integer programming-based lookahead to maximize the impact of inspections subject to constraints. A neural network-based supervised learning model is developed to model state transitions of real Chicago establishments using public CDPH inspection records, which demonstrates 10% AUC improvements compared with directly predicting establishments' failures. Our experiments not only show up to 24% (in simulation) or 33% (on real data) objective improvements resulting from our approach and robustness to surprise inspections, but also give insight into the impact of scheduling constraints.

Optimizing Urban Service Allocation with Time-Constrained Restless Bandits

TL;DR

This work tackles the problem of scheduling urban food inspections under strict per-establishment frequency guarantees and a limited per-period budget by extending restless RMABs with action-window constraints. It combines a window-augmented MDP encoding, an integer-programming lookahead, and a window-optimization scheme to enforce constraints while maximizing inspection impact, with a neural network model trained on CDPH data to learn state transitions. Empirically, the approach yields substantial objective gains—up to in synthetic experiments and on real CDPH data—while remaining robust to surprise inspections and various budget regimes. The results demonstrate scalable RMAB planning under service constraints and offer a principled path for deploying constrained scheduling in urban public services.

Abstract

Municipal inspections are an important part of maintaining the quality of goods and services. In this paper, we approach the problem of intelligently scheduling service inspections to maximize their impact, using the case of food establishment inspections in Chicago as a case study. The Chicago Department of Public Health (CDPH) inspects thousands of establishments each year, with a substantial fail rate (over 3,000 failed inspection reports in 2023). To balance the objectives of ensuring adherence to guidelines, minimizing disruption to establishments, and minimizing inspection costs, CDPH assigns each establishment an inspection window every year and guarantees that they will be inspected exactly once during that window. Meanwhile, CDPH also promises surprise public health inspections for unexpected food safety emergencies or complaints. These constraints create a challenge for a restless multi-armed bandit (RMAB) approach, for which there are no existing methods. We develop an extension to Whittle index-based systems for RMABs that can guarantee action window constraints and frequencies, and furthermore can be leveraged to optimize action window assignments themselves. Briefly, we combine MDP reformulation and integer programming-based lookahead to maximize the impact of inspections subject to constraints. A neural network-based supervised learning model is developed to model state transitions of real Chicago establishments using public CDPH inspection records, which demonstrates 10% AUC improvements compared with directly predicting establishments' failures. Our experiments not only show up to 24% (in simulation) or 33% (on real data) objective improvements resulting from our approach and robustness to surprise inspections, but also give insight into the impact of scheduling constraints.

Paper Structure

This paper contains 40 sections, 1 theorem, 11 equations, 5 figures, 2 tables.

Key Result

Proposition 1

Maximizing the sum of Whittle indices without additional frequency constraints OR with the constraint that each arm must be pulled exactly once during the lookahead window can be reduced to a weighted $b$-matching doi:b_matching.

Figures (5)

  • Figure 1: A example portion of the MDP after encoding the action window constraint. Suppose we have 5 belief states $(b_1, ..., b_5)$, an action window at months 3 and 4, and 12 months between action windows. 0 is the passive action, 1 is the active action. After $(b_5, 12, 0)$ is reached, a new chain begins at $(b_5, 1, 0)$ (not shown).
  • Figure 2: Results from the synthetic domain (with standard errors).
  • Figure 3: CDPH domain results. There are no error bars because the CDPH data defines a single model.
  • Figure 4: The reward drop caused by introducing surprise inspections at a 1% rate compared in the synthetic domain relative to the rewards in Figure \ref{['fig:syn_rel']}.
  • Figure 5: Structure of neural networks of learning transitions of food establishments.

Theorems & Definitions (2)

  • Proposition 1
  • proof