Sharp analysis of linear ensemble sampling

Arya Akhavan; David Janz; Csaba Szepesvári

Sharp analysis of linear ensemble sampling

Arya Akhavan, David Janz, Csaba Szepesvári

TL;DR

This work provides a sharp, Gaussian-perturbation analysis of linear ensemble sampling (ES) in stochastic linear bandits. By representing the Gaussian perturbations as diagonal Gaussian martingale transforms and embedding them into independent Brownian motions with clocked time changes via the Dambis–Dubins–Schwarz theorem, the authors reduce a complicated adaptive exploration problem to a time-uniform exceedance problem for Brownian motions. They prove that with ensemble size $m=Θ(d\log n)$, ES achieves a high-probability regret of order $\tilde{O}(d^{3/2}\sqrt{n})$, closing the gap to Thompson sampling while keeping computational cost similar. A key technical contribution is a time-uniform lower bound on exceedance frequencies for $m$ Brownian motions, which, together with a master regret bound, yields the main regret guarantee. The work also develops a suite of continuous-time tools (DDS embedding, Ornstein–Uhlenbeck time changes) and provides a near-tight ensemble-size lower bound, highlighting the necessity of $m$ growing with $d$ and ruling out too-small ensembles in general. The approach opens avenues for applying continuous-time embeddings to discrete-time learning analyses and suggests potential extensions to non-Gaussian perturbations and nonlinear models.

Abstract

We analyse linear ensemble sampling (ES) with standard Gaussian perturbations in stochastic linear bandits. We show that for ensemble size $m=Θ(d\log n)$, ES attains $\tilde O(d^{3/2}\sqrt n)$ high-probability regret, closing the gap to the Thompson sampling benchmark while keeping computation comparable. The proof brings a new perspective on randomized exploration in linear bandits by reducing the analysis to a time-uniform exceedance problem for $m$ independent Brownian motions. Intriguingly, this continuous-time lens is not forced; it appears natural--and perhaps necessary: the discrete-time problem seems to be asking for a continuous-time solution, and we know of no other way to obtain a sharp ES bound.

Sharp analysis of linear ensemble sampling

TL;DR

, ES achieves a high-probability regret of order

, closing the gap to Thompson sampling while keeping computational cost similar. A key technical contribution is a time-uniform lower bound on exceedance frequencies for

Brownian motions, which, together with a master regret bound, yields the main regret guarantee. The work also develops a suite of continuous-time tools (DDS embedding, Ornstein–Uhlenbeck time changes) and provides a near-tight ensemble-size lower bound, highlighting the necessity of

growing with

and ruling out too-small ensembles in general. The approach opens avenues for applying continuous-time embeddings to discrete-time learning analyses and suggests potential extensions to non-Gaussian perturbations and nonlinear models.

Abstract

We analyse linear ensemble sampling (ES) with standard Gaussian perturbations in stochastic linear bandits. We show that for ensemble size

, ES attains

high-probability regret, closing the gap to the Thompson sampling benchmark while keeping computation comparable. The proof brings a new perspective on randomized exploration in linear bandits by reducing the analysis to a time-uniform exceedance problem for

independent Brownian motions. Intriguingly, this continuous-time lens is not forced; it appears natural--and perhaps necessary: the discrete-time problem seems to be asking for a continuous-time solution, and we know of no other way to obtain a sharp ES bound.

Paper Structure (31 sections, 22 theorems, 126 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 31 sections, 22 theorems, 126 equations, 1 figure, 1 table, 1 algorithm.

Introduction
Technical contributions
Proof insight and outlook.
Notation and problem setting
Linear bandits
Ensemble sampling
Algorithm description
Noise scale
Computational complexity.
Current state of the art.
Results
Proof of thm:regret-bound
An embedding result for diagonal martingale transforms of Gaussian noise
Exceedance frequencies of Brownian motions
Putting things together
...and 16 more sections

Key Result

lemma 3.1

lem:conf_ellip With probability at least $1-\delta$, $\theta_\star\in \bigcap_{t\in \mathbf{N}_+} \Theta_{t-1}^\delta$, where

Figures (1)

Figure 1: Schematic geometry for $\mathcal{X} = \mathbf{B}^d_2$: the confidence ellipsoid (solid) is off-center and intersects level sets of the optimal reward $J(\theta) = \max_{x\in \mathcal{X}} \langle x,\theta\rangle = \|\theta\|$ (dashed). For illustration purposes, a sample of $m$ perturbed model parameters is drawn from a Gaussian with covariance inflated by about $1.2\times$ relative to the ellipsoid axes; those whose optimal reward exceeds that of the true parameter are highlighted as optimistic (red coloured dots). The master regret bound states that if we can keep the exceedance frequency of optimistic samples above a constant, then the regret will be under control. The difficulty with ensemble sampling is that, unlike in Thompson sampling, the model parameters are not freshly drawn in each time step but evolve in a correlated fashion over time. As such, controlling the exceedance frequencies is more challenging, and this is the main technical contribution of the paper.

Theorems & Definitions (23)

lemma 3.1: Confidence ellipsoids, abbasi2011improved
theorem 4.1: restate = theoremLinBound, name = Regret bound for alg:es-lin, linear ensemble sampling
theorem 4.2: name=,restate=lowerBoundThm
lemma 5.1: name=Self-normalized exceedance frequencies control regret,restate=exceedanceLemma
lemma 5.2: restate=fromFixedToUniform, name=From fixed to uniform direction
theorem 5.3: restate = theoremDiagGaussEmbedding, name = Diagonal Gaussian embedding
theorem 5.4: restate = theoremExcBrownian, name = Time-uniform exceedance count for Brownian motions
proposition 5.5: Fixed-direction exceedance control premise
corollary 5.6: restate = corUniformExceedance, name = Uniform exceedance control for ES
theorem A.1: Master regret bound, janz2023ensemble
...and 13 more

Sharp analysis of linear ensemble sampling

TL;DR

Abstract

Sharp analysis of linear ensemble sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (23)