An Optimal-Control Approach to Infinite-Horizon Restless Bandits: Achieving Asymptotic Optimality with Minimal Assumptions

Chen YAN

An Optimal-Control Approach to Infinite-Horizon Restless Bandits: Achieving Asymptotic Optimality with Minimal Assumptions

Chen YAN

TL;DR

It is demonstrated that the reachability of an optimal stationary state within the optimal-control problem is a sufficient condition for the existence of an asymptotically optimal policy.

Abstract

We adopt an optimal-control framework for addressing the undiscounted infinite-horizon discrete-time restless $N$-armed bandit problem. Unlike most studies that rely on constructing policies based on the relaxed single-armed Markov Decision Process (MDP), we propose relaxing the entire bandit MDP as an optimal-control problem through the certainty equivalence control principle. Our main contribution is demonstrating that the reachability of an optimal stationary state within the optimal-control problem is a sufficient condition for the existence of an asymptotically optimal policy. Such a policy can be devised using an "align and steer" strategy. This reachability assumption is less stringent than any prior assumptions imposed on the arm-level MDP, notably the unichain condition is no longer needed. Through numerical examples, we show that employing model predictive control for steering generally results in superior performance compared to other existing policies.

An Optimal-Control Approach to Infinite-Horizon Restless Bandits: Achieving Asymptotic Optimality with Minimal Assumptions

TL;DR

It is demonstrated that the reachability of an optimal stationary state within the optimal-control problem is a sufficient condition for the existence of an asymptotically optimal policy.

Abstract

We adopt an optimal-control framework for addressing the undiscounted infinite-horizon discrete-time restless

-armed bandit problem. Unlike most studies that rely on constructing policies based on the relaxed single-armed Markov Decision Process (MDP), we propose relaxing the entire bandit MDP as an optimal-control problem through the certainty equivalence control principle. Our main contribution is demonstrating that the reachability of an optimal stationary state within the optimal-control problem is a sufficient condition for the existence of an asymptotically optimal policy. Such a policy can be devised using an "align and steer" strategy. This reachability assumption is less stringent than any prior assumptions imposed on the arm-level MDP, notably the unichain condition is no longer needed. Through numerical examples, we show that employing model predictive control for steering generally results in superior performance compared to other existing policies.

Paper Structure (25 sections, 5 theorems, 40 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 25 sections, 5 theorems, 40 equations, 3 figures, 1 table, 1 algorithm.

Introduction
Problem Setup and Literature Review
Model Description
The Approach via Optimal-Control
Arm States Concatenation and the CEC Problem
Stationary Problems
Value Comparison and Asymptotic Optimality
Comparison with Related Works
Reachability and Asymptotic Optimality
The Effective Control Rules
The Align and Steer Policy
Reachability and a Linear Control Rule
Reachability Implies Asymptotic Optimality
Step One
Step Two
...and 10 more sections

Key Result

Lemma 1

The random vector $\mathcal{E}(\mathbf{X}(t),\mathbf{U}(t)) \overset{d}{=} \mathbf{X}(t+1) - \phi(\mathbf{X}(t),\mathbf{U}(t))$ verifies:

Figures (3)

Figure 1: Relationship of the three optimization problems
Figure 2: An example where no priority policy is asymptotically optimal.
Figure 3: An example where certain priority policies perform slightly better than $\pi^{N}_{\text{align\&MPC}}$.

Theorems & Definitions (13)

Lemma 1: gast2023linear
Definition 1
Definition 2
Theorem 1
Definition 3
Definition 4
Lemma 2
proof
Theorem 2
proof
...and 3 more

An Optimal-Control Approach to Infinite-Horizon Restless Bandits: Achieving Asymptotic Optimality with Minimal Assumptions

TL;DR

Abstract

An Optimal-Control Approach to Infinite-Horizon Restless Bandits: Achieving Asymptotic Optimality with Minimal Assumptions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (13)