Table of Contents
Fetching ...

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

William Réveillard, Richard Combes

TL;DR

The paper tackles stochastic bandits with multimodal mean rewards on known trees, where the mean vector has at most $m$ modes. It introduces the first computationally tractable algorithm to solve the Graves-Lai optimization problem in this setting, by blending discretization, dynamic programming on trees, and penalized subgradient descent to yield approximate optimal exploration rates. Solving the Graves-Lai problem enables OSSB-like asymptotically optimal algorithms and reveals that purely local search strategies can be arbitrarily suboptimal in multimodal scenarios; the authors provide structural results, error bounds for discretization, and empirical evidence showing improved regret over unstructured baselines. The approach is applicable to a broad class of reward distributions on trees, and the authors release open-source code to facilitate reproducibility and adoption in structured-bandit applications.

Abstract

We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

TL;DR

The paper tackles stochastic bandits with multimodal mean rewards on known trees, where the mean vector has at most modes. It introduces the first computationally tractable algorithm to solve the Graves-Lai optimization problem in this setting, by blending discretization, dynamic programming on trees, and penalized subgradient descent to yield approximate optimal exploration rates. Solving the Graves-Lai problem enables OSSB-like asymptotically optimal algorithms and reveals that purely local search strategies can be arbitrarily suboptimal in multimodal scenarios; the authors provide structural results, error bounds for discretization, and empirical evidence showing improved regret over unstructured baselines. The approach is applicable to a broad class of reward distributions on trees, and the authors release open-source code to facilitate reproducibility and adoption in structured-bandit applications.

Abstract

We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits

Paper Structure

This paper contains 52 sections, 16 theorems, 77 equations, 6 figures, 1 algorithm.

Key Result

Proposition 1

Consider an algorithm such that $\lim_{T \to \infty} \frac{R(\boldsymbol{\mu},T)}{T^{\alpha}} = 0$ for all $\alpha > 0$ and all $\boldsymbol{\mu} \in \mathcal{F}_{\le m}$. Then its asymptotic regret is lower bounded as $\lim \inf_{T \to \infty} \frac{R(\boldsymbol{\mu},T)}{\ln{T}} \ge C(m,\boldsymbo

Figures (6)

  • Figure 1: Summary of the procedure to solve \ref{['eq:PGL']}.
  • Figure 2: 2-modal example.
  • Figure 3: Cumulative regret as a function of the number of rounds.
  • Figure 4: $2$-modal reward instances on the binary tree $G$ ($K=7$, $\mathcal{M}(\boldsymbol{\mu})=\{4,6\}$, $k^\star(\boldsymbol{\mu})=6)$.
  • Figure 5: Runtime as a function of the number of arms.
  • ...and 1 more figures

Theorems & Definitions (20)

  • Proposition 1
  • Theorem 1: Theorem 2 in combes2017
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Proposition 6
  • Proposition 7
  • Proposition 8
  • Corollary 1
  • ...and 10 more