Ensemble sampling for linear bandits: small ensembles suffice
David Janz, Alexander E. Litvak, Csaba Szepesvári
TL;DR
This work delivers the first useful, rigorous analysis of ensemble sampling for stochastic linear bandits, showing that a symmetrised ensemble of size $m=O(d\log T)$ yields a high-probability regret of $R(T)=\tilde{O}((d\log T)^{5/2}\sqrt{T})$ even with infinite action sets. The analysis hinges on a Master Theorem for optimistic randomized algorithms and a careful control of the ensemble's singular values, enabling a nontrivial bound in the structured linear setting. The results position ensemble sampling as theoretically viable with modest ensemble sizes, and offer a blueprint for extending to generalized linear, kernelized, and deep-learning bandits. While the bound is not tight with respect to $m$, it marks a fundamental advance and highlights open directions for sharper analysis and online ensemble growth strategies. Overall, the paper clarifies when and why small ensembles can be effective in high-dimensional exploration problems.
Abstract
We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size of order $d \log T$ incurs regret at most of the order $(d \log T)^{5/2} \sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $\smash{\sqrt{T}}$ order regret. Our result is also the first to allow for infinite action sets.
