Table of Contents
Fetching ...

Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting

Gianmarco Genalti, Marco Mussi, Nicola Gatti, Marcello Restelli, Matteo Castiglioni, Alberto Maria Metelli

TL;DR

Graph-Triggered Bandits (GTBs) are proposed, a unifying framework to generalize and extend rested and restless bandits and focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.

Abstract

Rested and Restless Bandits are two well-known bandit settings that are useful to model real-world sequential decision-making problems in which the expected reward of an arm evolves over time due to the actions we perform or due to the nature. In this work, we propose Graph-Triggered Bandits (GTBs), a unifying framework to generalize and extend rested and restless bandits. In this setting, the evolution of the arms' expected rewards is governed by a graph defined over the arms. An edge connecting a pair of arms $(i,j)$ represents the fact that a pull of arm $i$ triggers the evolution of arm $j$, and vice versa. Interestingly, rested and restless bandits are both special cases of our model for some suitable (degenerated) graph. As relevant case studies for this setting, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs. For these cases, we study the optimal policies. We provide suitable algorithms for all scenarios and discuss their theoretical guarantees, highlighting the complexity of the learning problem concerning instance-dependent terms that encode specific properties of the underlying graph structure.

Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting

TL;DR

Graph-Triggered Bandits (GTBs) are proposed, a unifying framework to generalize and extend rested and restless bandits and focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.

Abstract

Rested and Restless Bandits are two well-known bandit settings that are useful to model real-world sequential decision-making problems in which the expected reward of an arm evolves over time due to the actions we perform or due to the nature. In this work, we propose Graph-Triggered Bandits (GTBs), a unifying framework to generalize and extend rested and restless bandits. In this setting, the evolution of the arms' expected rewards is governed by a graph defined over the arms. An edge connecting a pair of arms represents the fact that a pull of arm triggers the evolution of arm , and vice versa. Interestingly, rested and restless bandits are both special cases of our model for some suitable (degenerated) graph. As relevant case studies for this setting, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs. For these cases, we study the optimal policies. We provide suitable algorithms for all scenarios and discuss their theoretical guarantees, highlighting the complexity of the learning problem concerning instance-dependent terms that encode specific properties of the underlying graph structure.
Paper Structure (40 sections, 32 theorems, 122 equations, 2 figures, 4 algorithms)

This paper contains 40 sections, 32 theorems, 122 equations, 2 figures, 4 algorithms.

Key Result

Theorem 1

Compu- ting the optimal policy in Rising GTBs with general matrices $\mathbf{G}$ is NP-Hard.

Figures (2)

  • Figure 1: Examples of $3$-armed GTBs.
  • Figure 2: Instances used in the proof of Theorem \ref{['thr:rottingregretLBgeneric']}.

Theorems & Definitions (38)

  • Remark 1: Inclusion of Rested and Restless bandits in GTBs
  • Remark 2: On the Chosen Notion of Regret
  • Theorem 1: Complexity of finding the Optimal Policy in Rising GTBs
  • Theorem 2: Optimal Policy in Rising GTBs with Block-Diagonal CM
  • Theorem 3: Regret in Det. Rising GTBs with Block-Diagonal CMs
  • Definition 1: Block Sub-matrix
  • Theorem 4: Regret in Det. Rising GTBs with General Matrices
  • Remark 3: Computational Complexity
  • Lemma 1: Concentration of Estimator, adapted from metelli2022stochastic
  • Remark 4: Computational Complexity
  • ...and 28 more