Achieving $\tilde{\mathcal{O}}(1/N)$ Optimality Gap in Restless Bandits through Gaussian Approximation
Chen Yan, Weina Wang, Lei Ying
TL;DR
This work tackles finite-horizon Restless Multi-Armed Bandits with $N$ homogeneous arms, where standard fluid (LP-based) policies can incur a $\Theta(1/\sqrt{N})$ per-arm gap in degenerate settings. The authors introduce a Gaussian stochastic system that augments the fluid approximation by capturing both mean and variance around the fluid optimum $\mathbf{y}^*$, and solve a Gaussian SP within a $\tilde{\Theta}(1/\sqrt{N})$-neighborhood of $\mathbf{y}^*$ to derive an SP-based policy. Under a Uniqueness Assumption, this SP-based policy achieves a global optimality gap of $\tilde{\mathcal{O}}(1/N)$, improving upon LP-based approaches that exhibit $\Theta(1/\sqrt{N})$ gaps; the paper also proves that without Uniqueness, the SP-based approach still offers meaningful improvements. The theoretical results are complemented by numerical experiments on machine-maintenance RMABs, demonstrating that the SP-based policy yields substantial gains over LP-based policies as $N$ grows, with computational methods (SAA/EDDP) scaling linearly in horizon and state space. Overall, the work provides a principled, scalable route to near-optimal policies for degenerate RMABs and highlights the value of variance-aware Gaussian approximations in stochastic decision problems.
Abstract
We study the finite-horizon Restless Multi-Armed Bandit (RMAB) problem with $N$ homogeneous arms. Prior work has shown that when an RMAB satisfies a non-degeneracy condition, Linear-Programming-based (LP-based) policies derived from the fluid approximation, which captures the mean dynamics of the system, achieve an exponentially small optimality gap. However, it is common for RMABs to be degenerate, in which case LP-based policies can result in a $Θ(1/\sqrt{N})$ optimality gap per arm. In this paper, we propose a novel Stochastic-Programming-based (SP-based) policy that, under a uniqueness assumption, achieves an $\tilde{\mathcal{O}}(1/N)$ optimality gap for degenerate RMABs. Our approach is based on the construction of a Gaussian stochastic system that captures not only the mean but also the variance of the RMAB dynamics, resulting in a more accurate approximation than the fluid approximation. We then solve a stochastic program for this system to obtain our policy. This is the first result to establish an $\tilde{\mathcal{O}}(1/N)$ optimality gap for degenerate RMABs.
