Table of Contents
Fetching ...

Ensemble sampling for linear bandits: small ensembles suffice

David Janz, Alexander E. Litvak, Csaba Szepesvári

TL;DR

This work delivers the first useful, rigorous analysis of ensemble sampling for stochastic linear bandits, showing that a symmetrised ensemble of size $m=O(d\log T)$ yields a high-probability regret of $R(T)=\tilde{O}((d\log T)^{5/2}\sqrt{T})$ even with infinite action sets. The analysis hinges on a Master Theorem for optimistic randomized algorithms and a careful control of the ensemble's singular values, enabling a nontrivial bound in the structured linear setting. The results position ensemble sampling as theoretically viable with modest ensemble sizes, and offer a blueprint for extending to generalized linear, kernelized, and deep-learning bandits. While the bound is not tight with respect to $m$, it marks a fundamental advance and highlights open directions for sharper analysis and online ensemble growth strategies. Overall, the paper clarifies when and why small ensembles can be effective in high-dimensional exploration problems.

Abstract

We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size of order $d \log T$ incurs regret at most of the order $(d \log T)^{5/2} \sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $\smash{\sqrt{T}}$ order regret. Our result is also the first to allow for infinite action sets.

Ensemble sampling for linear bandits: small ensembles suffice

TL;DR

This work delivers the first useful, rigorous analysis of ensemble sampling for stochastic linear bandits, showing that a symmetrised ensemble of size yields a high-probability regret of even with infinite action sets. The analysis hinges on a Master Theorem for optimistic randomized algorithms and a careful control of the ensemble's singular values, enabling a nontrivial bound in the structured linear setting. The results position ensemble sampling as theoretically viable with modest ensemble sizes, and offer a blueprint for extending to generalized linear, kernelized, and deep-learning bandits. While the bound is not tight with respect to , it marks a fundamental advance and highlights open directions for sharper analysis and online ensemble growth strategies. Overall, the paper clarifies when and why small ensembles can be effective in high-dimensional exploration problems.

Abstract

We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a -dimensional stochastic linear bandit with an interaction horizon , ensemble sampling with an ensemble of size of order incurs regret at most of the order . Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with -- which defeats the purpose of ensemble sampling -- while obtaining near order regret. Our result is also the first to allow for infinite action sets.
Paper Structure (20 sections, 19 theorems, 98 equations, 3 algorithms)

This paper contains 20 sections, 19 theorems, 98 equations, 3 algorithms.

Key Result

Theorem 1

Fix $\delta \in (0,1]$ and take $r_{t} = 7\beta_t^\delta$ for all $t \in \mathbb{N}$, $\lambda \geq 5$ and $m \geq 400 \log (NT/\delta)$ for $N = (134 \sqrt{1+T/\lambda})^d$. Then there exists a universal constant $C>0$ such that, with probability at least $1-\delta$, the regret incurred by a learne

Theorems & Definitions (59)

  • Theorem 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Remark 6
  • Remark 7
  • Remark 8
  • Theorem 2: Master regret bound
  • ...and 49 more