Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

William Réveillard; Richard Combes

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

William Réveillard, Richard Combes

TL;DR

The paper tackles stochastic bandits with multimodal mean rewards on known trees, where the mean vector has at most $m$ modes. It introduces the first computationally tractable algorithm to solve the Graves-Lai optimization problem in this setting, by blending discretization, dynamic programming on trees, and penalized subgradient descent to yield approximate optimal exploration rates. Solving the Graves-Lai problem enables OSSB-like asymptotically optimal algorithms and reveals that purely local search strategies can be arbitrarily suboptimal in multimodal scenarios; the authors provide structural results, error bounds for discretization, and empirical evidence showing improved regret over unstructured baselines. The approach is applicable to a broad class of reward distributions on trees, and the authors release open-source code to facilitate reproducibility and adoption in structured-bandit applications.

Abstract

We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

TL;DR

The paper tackles stochastic bandits with multimodal mean rewards on known trees, where the mean vector has at most

modes. It introduces the first computationally tractable algorithm to solve the Graves-Lai optimization problem in this setting, by blending discretization, dynamic programming on trees, and penalized subgradient descent to yield approximate optimal exploration rates. Solving the Graves-Lai problem enables OSSB-like asymptotically optimal algorithms and reveals that purely local search strategies can be arbitrarily suboptimal in multimodal scenarios; the authors provide structural results, error bounds for discretization, and empirical evidence showing improved regret over unstructured baselines. The approach is applicable to a broad class of reward distributions on trees, and the authors release open-source code to facilitate reproducibility and adoption in structured-bandit applications.

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

TL;DR

Abstract

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (20)