Table of Contents
Fetching ...

Imprecise Multi-Armed Bandits

Vanessa Kosoy

TL;DR

This work introduces imprecise bandits, a framework where arm-specific outcomes are drawn from credal sets rather than fixed distributions, captured via lower previsions. The key idea is to generalize stochastic and adversarial bandits by modeling uncertainty with time-stationary credal sets tied to a linear-structure hypothesis class, enabling regret guarantees through optimistic planning over a confidence set of hypotheses. The main algorithm, IUCB, leverages a geometry of confidence sets in an extended space to achieve a near $\tilde{O}(\sqrt{N})$ regret in the general case and a logarithmic regret in the positive-gap regime, with lower bounds showing necessity of core parameters such as $D_Z$, $S$, and $R$. The framework unifies and extends stochastic linear bandits and certain zero-sum game settings, offering a principled approach to decision-making under ambiguous distributions with structured adversarial variation, and points to future work toward anytime guarantees and reinforcement learning extensions.

Abstract

We introduce a novel multi-armed bandit framework, where each arm is associated with a fixed unknown credal set over the space of outcomes (which can be richer than just the reward). The arm-to-credal-set correspondence comes from a known class of hypotheses. We then define a notion of regret corresponding to the lower prevision defined by these credal sets. Equivalently, the setting can be regarded as a two-player zero-sum game, where, on each round, the agent chooses an arm and the adversary chooses the distribution over outcomes from a set of options associated with this arm. The regret is defined with respect to the value of game. For certain natural hypothesis classes, loosely analgous to stochastic linear bandits (which are a special case of the resulting setting), we propose an algorithm and prove a corresponding upper bound on regret. We also prove lower bounds on regret for particular special cases.

Imprecise Multi-Armed Bandits

TL;DR

This work introduces imprecise bandits, a framework where arm-specific outcomes are drawn from credal sets rather than fixed distributions, captured via lower previsions. The key idea is to generalize stochastic and adversarial bandits by modeling uncertainty with time-stationary credal sets tied to a linear-structure hypothesis class, enabling regret guarantees through optimistic planning over a confidence set of hypotheses. The main algorithm, IUCB, leverages a geometry of confidence sets in an extended space to achieve a near regret in the general case and a logarithmic regret in the positive-gap regime, with lower bounds showing necessity of core parameters such as , , and . The framework unifies and extends stochastic linear bandits and certain zero-sum game settings, offering a principled approach to decision-making under ambiguous distributions with structured adversarial variation, and points to future work toward anytime guarantees and reinforcement learning extensions.

Abstract

We introduce a novel multi-armed bandit framework, where each arm is associated with a fixed unknown credal set over the space of outcomes (which can be richer than just the reward). The arm-to-credal-set correspondence comes from a known class of hypotheses. We then define a notion of regret corresponding to the lower prevision defined by these credal sets. Equivalently, the setting can be regarded as a two-player zero-sum game, where, on each round, the agent chooses an arm and the adversary chooses the distribution over outcomes from a set of options associated with this arm. The regret is defined with respect to the value of game. For certain natural hypothesis classes, loosely analgous to stochastic linear bandits (which are a special case of the resulting setting), we propose an algorithm and prove a corresponding upper bound on regret. We also prove lower bounds on regret for particular special cases.
Paper Structure (54 sections, 85 theorems, 713 equations, 1 table, 4 algorithms)

This paper contains 54 sections, 85 theorems, 713 equations, 1 table, 4 algorithms.

Key Result

Theorem 2.1

Let $N$ be a positive integer. Then, for any $H:\mathcal{A}\rightarrow\Delta[-1,+1]$

Theorems & Definitions (180)

  • Theorem 2.1
  • Theorem 2.2
  • Theorem 2.3
  • Theorem 2.4: Dani-Hayes-Kakade
  • Theorem 2.5: Dani-Hayes-Kakade
  • Theorem 2.6
  • Theorem 2.7: O'Donoghue-Lattimore-Osband
  • Example 2.1
  • Example 2.2
  • Example 3.1
  • ...and 170 more