Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach
Dimitris Bertsimas, Cheol Woo Kim, José Niño-Mora
TL;DR
This work addresses optimal control for fluid restless multi-armed bandits with affine or quadratic dynamics by leveraging Pontryagin-based optimality conditions and a shooting method to generate optimal trajectories. It then learns a time-dependent state-feedback policy using Optimal Classification Trees with Hyperplane Splits (OCT-H), enhanced with nonlinear feature augmentation to capture switching curves. The approach is validated on machine maintenance, epidemic control, and fisheries control, achieving high imitation accuracy and substantial speed-ups (up to $26$ million times) over solving from scratch. The results demonstrate practical, interpretable policies that scale to larger problem sizes while maintaining state feasibility and offering real-time applicability in complex FRMAB settings.
Abstract
We propose a machine learning approach to the optimal control of fluid restless multi-armed bandits (FRMABs) with state equations that are either affine or quadratic in the state variables. By deriving fundamental properties of FRMAB problems, we design an efficient machine learning based algorithm. Using this algorithm, we solve multiple instances with varying initial states to generate a comprehensive training set. We then learn a state feedback policy using Optimal Classification Trees with hyperplane splits (OCT-H). We test our approach on machine maintenance, epidemic control and fisheries control problems. Our method yields high-quality state feedback policies and achieves a speed-up of up to 26 million times compared to a direct numerical algorithm for fluid problems.
