Ensemble sampling for linear bandits: small ensembles suffice

David Janz; Alexander E. Litvak; Csaba Szepesvári

Ensemble sampling for linear bandits: small ensembles suffice

David Janz, Alexander E. Litvak, Csaba Szepesvári

TL;DR

This work delivers the first useful, rigorous analysis of ensemble sampling for stochastic linear bandits, showing that a symmetrised ensemble of size $m=O(d\log T)$ yields a high-probability regret of $R(T)=\tilde{O}((d\log T)^{5/2}\sqrt{T})$ even with infinite action sets. The analysis hinges on a Master Theorem for optimistic randomized algorithms and a careful control of the ensemble's singular values, enabling a nontrivial bound in the structured linear setting. The results position ensemble sampling as theoretically viable with modest ensemble sizes, and offer a blueprint for extending to generalized linear, kernelized, and deep-learning bandits. While the bound is not tight with respect to $m$, it marks a fundamental advance and highlights open directions for sharper analysis and online ensemble growth strategies. Overall, the paper clarifies when and why small ensembles can be effective in high-dimensional exploration problems.

Abstract

We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size of order $d \log T$ incurs regret at most of the order $(d \log T)^{5/2} \sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $\smash{\sqrt{T}}$ order regret. Our result is also the first to allow for infinite action sets.

Ensemble sampling for linear bandits: small ensembles suffice

TL;DR

This work delivers the first useful, rigorous analysis of ensemble sampling for stochastic linear bandits, showing that a symmetrised ensemble of size

yields a high-probability regret of

even with infinite action sets. The analysis hinges on a Master Theorem for optimistic randomized algorithms and a careful control of the ensemble's singular values, enabling a nontrivial bound in the structured linear setting. The results position ensemble sampling as theoretically viable with modest ensemble sizes, and offer a blueprint for extending to generalized linear, kernelized, and deep-learning bandits. While the bound is not tight with respect to

, it marks a fundamental advance and highlights open directions for sharper analysis and online ensemble growth strategies. Overall, the paper clarifies when and why small ensembles can be effective in high-dimensional exploration problems.

Abstract

We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a

-dimensional stochastic linear bandit with an interaction horizon

, ensemble sampling with an ensemble of size of order

incurs regret at most of the order

. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with

-- which defeats the purpose of ensemble sampling -- while obtaining near

order regret. Our result is also the first to allow for infinite action sets.

Paper Structure (20 sections, 19 theorems, 98 equations, 3 algorithms)

This paper contains 20 sections, 19 theorems, 98 equations, 3 algorithms.

Introduction
Linear ensemble sampling
Problem setting: stochastic linear bandits
Algorithm: linear ensemble sampling
Regret bound for linear ensemble sampling
Comparison to related results
Analysis of linear ensemble sampling
Master Theorem: a regret bound for optimistic randomised algorithms
Proof of \ref{['claim:es']}: regret bound for linear ensemble sampling
Setting up to prove \ref{['thm:sing-val-gamma']}: bound on singular values
Proof of \ref{['lem:process-max-lower-bound']}
Discussion
A reformulation of \ref{['alg:es']} in the style of lu2017ensemble
Proof of \ref{['thm:master']}: master regret bound
Proof of \ref{['lem:thetaopt-lower-bound']}: optimism for elliptical confidence sets
...and 5 more sections

Key Result

Theorem 1

Fix $\delta \in (0,1]$ and take $r_{t} = 7\beta_t^\delta$ for all $t \in \mathbb{N}$, $\lambda \geq 5$ and $m \geq 400 \log (NT/\delta)$ for $N = (134 \sqrt{1+T/\lambda})^d$. Then there exists a universal constant $C>0$ such that, with probability at least $1-\delta$, the regret incurred by a learne

Theorems & Definitions (59)

Theorem 1
Remark 1
Remark 2
Remark 3
Remark 4
Remark 5
Remark 6
Remark 7
Remark 8
Theorem 2: Master regret bound
...and 49 more

Ensemble sampling for linear bandits: small ensembles suffice

TL;DR

Abstract

Ensemble sampling for linear bandits: small ensembles suffice

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (59)