Table of Contents
Fetching ...

Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach

Dimitris Bertsimas, Cheol Woo Kim, José Niño-Mora

TL;DR

This work addresses optimal control for fluid restless multi-armed bandits with affine or quadratic dynamics by leveraging Pontryagin-based optimality conditions and a shooting method to generate optimal trajectories. It then learns a time-dependent state-feedback policy using Optimal Classification Trees with Hyperplane Splits (OCT-H), enhanced with nonlinear feature augmentation to capture switching curves. The approach is validated on machine maintenance, epidemic control, and fisheries control, achieving high imitation accuracy and substantial speed-ups (up to $26$ million times) over solving from scratch. The results demonstrate practical, interpretable policies that scale to larger problem sizes while maintaining state feasibility and offering real-time applicability in complex FRMAB settings.

Abstract

We propose a machine learning approach to the optimal control of fluid restless multi-armed bandits (FRMABs) with state equations that are either affine or quadratic in the state variables. By deriving fundamental properties of FRMAB problems, we design an efficient machine learning based algorithm. Using this algorithm, we solve multiple instances with varying initial states to generate a comprehensive training set. We then learn a state feedback policy using Optimal Classification Trees with hyperplane splits (OCT-H). We test our approach on machine maintenance, epidemic control and fisheries control problems. Our method yields high-quality state feedback policies and achieves a speed-up of up to 26 million times compared to a direct numerical algorithm for fluid problems.

Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach

TL;DR

This work addresses optimal control for fluid restless multi-armed bandits with affine or quadratic dynamics by leveraging Pontryagin-based optimality conditions and a shooting method to generate optimal trajectories. It then learns a time-dependent state-feedback policy using Optimal Classification Trees with Hyperplane Splits (OCT-H), enhanced with nonlinear feature augmentation to capture switching curves. The approach is validated on machine maintenance, epidemic control, and fisheries control, achieving high imitation accuracy and substantial speed-ups (up to million times) over solving from scratch. The results demonstrate practical, interpretable policies that scale to larger problem sizes while maintaining state feasibility and offering real-time applicability in complex FRMAB settings.

Abstract

We propose a machine learning approach to the optimal control of fluid restless multi-armed bandits (FRMABs) with state equations that are either affine or quadratic in the state variables. By deriving fundamental properties of FRMAB problems, we design an efficient machine learning based algorithm. Using this algorithm, we solve multiple instances with varying initial states to generate a comprehensive training set. We then learn a state feedback policy using Optimal Classification Trees with hyperplane splits (OCT-H). We test our approach on machine maintenance, epidemic control and fisheries control problems. Our method yields high-quality state feedback policies and achieves a speed-up of up to 26 million times compared to a direct numerical algorithm for fluid problems.

Paper Structure

This paper contains 29 sections, 9 theorems, 42 equations, 1 figure, 3 tables, 3 algorithms.

Key Result

Lemma 1

Under Assumption ass:concave, $\bm{x}^*(\cdot)$ and $\bm{u}^*(\cdot)$ are optimal state and control trajectories for Problem (eq:genrmabp), if and only if there exists a continuous and piecewise continously differentiable costate variable $\bm{y}(\cdot)$, such that

Figures (1)

  • Figure 1: The decision tree OCT-H learned for the infinite server routing problem with $n = 2$.

Theorems & Definitions (18)

  • Lemma 1: Pontryagin Maximum Principle Bittner1963LSPgrassetal08
  • Proposition 1
  • proof
  • Remark 1
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • ...and 8 more