Minimizing Cost Rather Than Maximizing Reward in Restless Multi-Armed Bandits

R. Teal Witter; Lisa Hellerstein

Minimizing Cost Rather Than Maximizing Reward in Restless Multi-Armed Bandits

R. Teal Witter, Lisa Hellerstein

TL;DR

A constrained minimization problem for RMABs is introduced that balances the goal of achieving a reward threshold while minimizing total cost and it is shown that even a bi-criteria approximate version of the problem is PSPACE-hard.

Abstract

Restless Multi-Armed Bandits (RMABs) offer a powerful framework for solving resource constrained maximization problems. However, the formulation can be inappropriate for settings where the limiting constraint is a reward threshold rather than a budget. We introduce a constrained minimization problem for RMABs that balances the goal of achieving a reward threshold while minimizing total cost. We show that even a bi-criteria approximate version of the problem is PSPACE-hard. Motivated by the hardness result, we define a decoupled problem, indexability and a Whittle index for the minimization problem, mirroring the corresponding concepts for the maximization problem. Further, we show that the Whittle index for the minimization problem can easily be computed from the Whittle index for the maximization problem. Consequently, Whittle index results on RMAB instances for the maximization problem give Whittle index results for the minimization problem. Despite the similarities between the minimization and maximization problems, solving the minimization problem is not as simple as taking direct analogs of the heuristics for the maximization problem. We give an example of an RMAB for which the greedy Whittle index heuristic achieves the optimal solution for the maximization problem, while the analogous heuristic yields the worst possible solution for the minimization problem. In light of this, we present and compare several heuristics for solving the minimization problem on real and synthetic data. Our work suggests the importance of continued investigation into the minimization problem.

Minimizing Cost Rather Than Maximizing Reward in Restless Multi-Armed Bandits

TL;DR

Abstract

Paper Structure (24 sections, 6 theorems, 25 equations, 3 figures, 7 algorithms)

This paper contains 24 sections, 6 theorems, 25 equations, 3 figures, 7 algorithms.

Introduction
Wildlife Conservation
Energy Management
Healthcare
Our Contributions
Related Work
Restless Multi-Armed Bandits
Restless Bandits and Exact Whittle Index
Q-Learning Whittle Index
General Guarantees of the Whittle Index
RMAB Definition and Notation
The Minimization Problem
Decoupling the Minimization Problem
Remark:
Maximization vs Minimization Problems
...and 9 more sections

Key Result

Theorem 5

Fix $\alpha \geq 1$ and $\rho > 0$. Finding an $(\alpha,\rho)$-approximate strategy for the minimization problem is PSPACE-hard even when costs are binary.

Figures (3)

Figure 1: Because the costs and rewards are similar in the dataset, the exponentially increasing budget and the reward truncation have little effect. As a result, Algorithms \ref{['alg:min_heuristic']}, \ref{['alg:budget_min_heuristic']}, and \ref{['alg:truncated_min_heuristic']} all give solutions with similar cost.
Figure 2: Because the expected reward of unreliable MDPS is higher, Algorithms \ref{['alg:min_heuristic']} and \ref{['alg:budget_min_heuristic']} select them first. However, Algorithm \ref{['alg:truncated_min_heuristic']} quickly gives solutions with lower cost because it truncates the large rewards of the second group and selects the MDPs with reliable reward instead.
Figure 3: Since the probabilities and rewards are similar in the uniform parameter setting, expected reward is a good indication of quality and Algorithm \ref{['alg:min_heuristic']} gives the best performance.

Theorems & Definitions (17)

Definition 2: Maximization Whittle Index whittle1988restless
Theorem 5
proof : Proof Outline
proof : Proof of Decoupling
Definition 7: Indexability for Minimization
Definition 8: Whittle Index for Minimization
corollary 1
corollary 2
Claim 9
proof : Proof of Claim \ref{['claim:min_vs_max']}
...and 7 more

Minimizing Cost Rather Than Maximizing Reward in Restless Multi-Armed Bandits

TL;DR

Abstract

Minimizing Cost Rather Than Maximizing Reward in Restless Multi-Armed Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (17)