Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting

Gianmarco Genalti; Marco Mussi; Nicola Gatti; Marcello Restelli; Matteo Castiglioni; Alberto Maria Metelli

Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting

Gianmarco Genalti, Marco Mussi, Nicola Gatti, Marcello Restelli, Matteo Castiglioni, Alberto Maria Metelli

TL;DR

Graph-Triggered Bandits (GTBs) are proposed, a unifying framework to generalize and extend rested and restless bandits and focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.

Abstract

Rested and Restless Bandits are two well-known bandit settings that are useful to model real-world sequential decision-making problems in which the expected reward of an arm evolves over time due to the actions we perform or due to the nature. In this work, we propose Graph-Triggered Bandits (GTBs), a unifying framework to generalize and extend rested and restless bandits. In this setting, the evolution of the arms' expected rewards is governed by a graph defined over the arms. An edge connecting a pair of arms $(i,j)$ represents the fact that a pull of arm $i$ triggers the evolution of arm $j$, and vice versa. Interestingly, rested and restless bandits are both special cases of our model for some suitable (degenerated) graph. As relevant case studies for this setting, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs. For these cases, we study the optimal policies. We provide suitable algorithms for all scenarios and discuss their theoretical guarantees, highlighting the complexity of the learning problem concerning instance-dependent terms that encode specific properties of the underlying graph structure.

Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting

TL;DR

Abstract

represents the fact that a pull of arm

triggers the evolution of arm

, and vice versa. Interestingly, rested and restless bandits are both special cases of our model for some suitable (degenerated) graph. As relevant case studies for this setting, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs. For these cases, we study the optimal policies. We provide suitable algorithms for all scenarios and discuss their theoretical guarantees, highlighting the complexity of the learning problem concerning instance-dependent terms that encode specific properties of the underlying graph structure.

Paper Structure (40 sections, 32 theorems, 122 equations, 2 figures, 4 algorithms)

This paper contains 40 sections, 32 theorems, 122 equations, 2 figures, 4 algorithms.

Introduction
Contributions.
Graph-Triggered Bandits
Notions on Rested and Restless Bandits
Setting
Block-Diagonal Connectivity Matrix.
Learning Problem
Rising Graph-Triggered Bandits
Instance Characterization.
Optimality in Rising GTBs
Deterministic Rising GTBs
Algorithm for Deterministic Rising GTBs with Block-Diagonal CMs
Algorithm for Deterministic Rising GTBs with General Matrices
Stochastic Rising GTBs
Algorithm.
...and 25 more sections

Key Result

Theorem 1

Compu- ting the optimal policy in Rising GTBs with general matrices $\mathbf{G}$ is NP-Hard.

Figures (2)

Figure 1: Examples of $3$-armed GTBs.
Figure 2: Instances used in the proof of Theorem \ref{['thr:rottingregretLBgeneric']}.

Theorems & Definitions (38)

Remark 1: Inclusion of Rested and Restless bandits in GTBs
Remark 2: On the Chosen Notion of Regret
Theorem 1: Complexity of finding the Optimal Policy in Rising GTBs
Theorem 2: Optimal Policy in Rising GTBs with Block-Diagonal CM
Theorem 3: Regret in Det. Rising GTBs with Block-Diagonal CMs
Definition 1: Block Sub-matrix
Theorem 4: Regret in Det. Rising GTBs with General Matrices
Remark 3: Computational Complexity
Lemma 1: Concentration of Estimator, adapted from metelli2022stochastic
Remark 4: Computational Complexity
...and 28 more

Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting

TL;DR

Abstract

Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (38)