Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms
William Réveillard, Richard Combes
TL;DR
The paper tackles stochastic bandits with multimodal mean rewards on known trees, where the mean vector has at most $m$ modes. It introduces the first computationally tractable algorithm to solve the Graves-Lai optimization problem in this setting, by blending discretization, dynamic programming on trees, and penalized subgradient descent to yield approximate optimal exploration rates. Solving the Graves-Lai problem enables OSSB-like asymptotically optimal algorithms and reveals that purely local search strategies can be arbitrarily suboptimal in multimodal scenarios; the authors provide structural results, error bounds for discretization, and empirical evidence showing improved regret over unstructured baselines. The approach is applicable to a broad class of reward distributions on trees, and the authors release open-source code to facilitate reproducibility and adoption in structured-bandit applications.
Abstract
We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits
