Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation

Guojun Xiong; Haichuan Wang; Yuqi Pan; Saptarshi Mandal; Sanket Shah; Niclas Boehmer; Milind Tambe

Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation

Guojun Xiong, Haichuan Wang, Yuqi Pan, Saptarshi Mandal, Sanket Shah, Niclas Boehmer, Milind Tambe

TL;DR

This work addresses scarce-resource allocation where each agent/arm can receive at most one intervention within a finite horizon. It introduces Finite-Horizon Single-Pull RMABs (SPRMABs) and a lightweight Single-Pull Index (SPI) policy built on a dummy-state expansion that enforces the one-pull constraint. The authors prove asymptotic optimality and a finite-sample bound on the average optimality gap, showing near-optimal performance (gap decaying as $\tilde{\mathcal{O}}(\frac{1}{\rho^{1/2}} + \frac{1}{\rho^{3/2}})$) and validate the approach across healthcare and other domains with extensive simulations. The method is computationally efficient, does not require indexability, and substantially outperforms traditional Whittle and LP-based policies under the single-pull constraint, highlighting its practical impact for fair and efficient scarce-resource allocation.

Abstract

Restless multi-armed bandits (RMABs) have been highly successful in optimizing sequential resource allocation across many domains. However, in many practical settings with highly scarce resources, where each agent can only receive at most one resource, such as healthcare intervention programs, the standard RMAB framework falls short. To tackle such scenarios, we introduce Finite-Horizon Single-Pull RMABs (SPRMABs), a novel variant in which each arm can only be pulled once. This single-pull constraint introduces additional complexity, rendering many existing RMAB solutions suboptimal or ineffective. %To address this, we propose using dummy states to duplicate the system, ensuring that once an arm is activated, it transitions exclusively within the dummy states. To address this shortcoming, we propose using \textit{dummy states} that expand the system and enforce the one-pull constraint. We then design a lightweight index policy for this expanded system. For the first time, we demonstrate that our index policy achieves a sub-linearly decaying average optimality gap of $\tilde{\mathcal{O}}\left(\frac{1}{ρ^{1/2}}\right)$ for a finite number of arms, where $ρ$ is the scaling factor for each arm cluster. Extensive simulations validate the proposed method, showing robust performance across various domains compared to existing benchmarks.

Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation

TL;DR

) and validate the approach across healthcare and other domains with extensive simulations. The method is computationally efficient, does not require indexability, and substantially outperforms traditional Whittle and LP-based policies under the single-pull constraint, highlighting its practical impact for fair and efficient scarce-resource allocation.

Abstract

for a finite number of arms, where

is the scaling factor for each arm cluster. Extensive simulations validate the proposed method, showing robust performance across various domains compared to existing benchmarks.

Paper Structure (35 sections, 11 theorems, 48 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 35 sections, 11 theorems, 48 equations, 8 figures, 3 tables, 1 algorithm.

Introduction
Motivating Domains and Examples
System Model and Problem Formulation
Existing Index Policies and Failure Examples
Challenge for Extending Existing Methods
Proposed Method
The Single-Pull Index Policy
Asymptotic and Non-Asymptotic Optimality
Experiments
Benchmarks
Experimental Domains
Continuous Positive Airway Pressure Therapy (CPAP)herlihy2023planningli2022towardswang2024online
Mobile Healthcare for Maternal Health (MHMH)ghosh2022indexability
Numerical Results
Conclusions
...and 20 more sections

Key Result

Proposition 1

The MDP for each patient defined in Example example1 is indexable.

Figures (8)

Figure 1: General transition kernels with $a=0$ in above and $a=1$ in below for a patient in CPAP example.
Figure 2: A CPAP setting with $3$ different states, $20$ different types of arms, each type has $10$ arms, the budget is set to be $10$ and the time Horizon is $10$.
Figure 3: A toy example of SPRAMB with dummy states. The original state space is ${\mathcal{S}}=\{s_0,s_1\}$, and it leads to a 4-state expanded system as ${\mathcal{S}}^\prime=\{s_0, s_1, s_{0,d}, s_{1,d}\}$.
Figure 4: We present the average running time of SPI policy, finite whittle policy, and infinite whittle policy in the CPAP setting $(N,S,K,\rho,T)=(10, 10, 50, 50, 10)$.
Figure 5: We present the chart corresponding to Table \ref{['tab:formula_table']}, where the performance of all policies is normalized between the optimal upper bound and the random policy. Specifically, the normalized optimal upper bound is set to 1, while the performance of the random policy is set to 0, providing a clear comparison of the relative performance across all policies.
...and 3 more figures

Theorems & Definitions (20)

Example 1
Proposition 1
Proposition 2
Proposition 3
Remark 1
Definition 1: Dummy state
Remark 2
Example 2
Remark 3
Remark 4
...and 10 more

Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation

TL;DR

Abstract

Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (20)