Table of Contents
Fetching ...

Sharp analysis of linear ensemble sampling

Arya Akhavan, David Janz, Csaba Szepesvári

TL;DR

This work provides a sharp, Gaussian-perturbation analysis of linear ensemble sampling (ES) in stochastic linear bandits. By representing the Gaussian perturbations as diagonal Gaussian martingale transforms and embedding them into independent Brownian motions with clocked time changes via the Dambis–Dubins–Schwarz theorem, the authors reduce a complicated adaptive exploration problem to a time-uniform exceedance problem for Brownian motions. They prove that with ensemble size $m=Θ(d\log n)$, ES achieves a high-probability regret of order $\tilde{O}(d^{3/2}\sqrt{n})$, closing the gap to Thompson sampling while keeping computational cost similar. A key technical contribution is a time-uniform lower bound on exceedance frequencies for $m$ Brownian motions, which, together with a master regret bound, yields the main regret guarantee. The work also develops a suite of continuous-time tools (DDS embedding, Ornstein–Uhlenbeck time changes) and provides a near-tight ensemble-size lower bound, highlighting the necessity of $m$ growing with $d$ and ruling out too-small ensembles in general. The approach opens avenues for applying continuous-time embeddings to discrete-time learning analyses and suggests potential extensions to non-Gaussian perturbations and nonlinear models.

Abstract

We analyse linear ensemble sampling (ES) with standard Gaussian perturbations in stochastic linear bandits. We show that for ensemble size $m=Θ(d\log n)$, ES attains $\tilde O(d^{3/2}\sqrt n)$ high-probability regret, closing the gap to the Thompson sampling benchmark while keeping computation comparable. The proof brings a new perspective on randomized exploration in linear bandits by reducing the analysis to a time-uniform exceedance problem for $m$ independent Brownian motions. Intriguingly, this continuous-time lens is not forced; it appears natural--and perhaps necessary: the discrete-time problem seems to be asking for a continuous-time solution, and we know of no other way to obtain a sharp ES bound.

Sharp analysis of linear ensemble sampling

TL;DR

This work provides a sharp, Gaussian-perturbation analysis of linear ensemble sampling (ES) in stochastic linear bandits. By representing the Gaussian perturbations as diagonal Gaussian martingale transforms and embedding them into independent Brownian motions with clocked time changes via the Dambis–Dubins–Schwarz theorem, the authors reduce a complicated adaptive exploration problem to a time-uniform exceedance problem for Brownian motions. They prove that with ensemble size , ES achieves a high-probability regret of order , closing the gap to Thompson sampling while keeping computational cost similar. A key technical contribution is a time-uniform lower bound on exceedance frequencies for Brownian motions, which, together with a master regret bound, yields the main regret guarantee. The work also develops a suite of continuous-time tools (DDS embedding, Ornstein–Uhlenbeck time changes) and provides a near-tight ensemble-size lower bound, highlighting the necessity of growing with and ruling out too-small ensembles in general. The approach opens avenues for applying continuous-time embeddings to discrete-time learning analyses and suggests potential extensions to non-Gaussian perturbations and nonlinear models.

Abstract

We analyse linear ensemble sampling (ES) with standard Gaussian perturbations in stochastic linear bandits. We show that for ensemble size , ES attains high-probability regret, closing the gap to the Thompson sampling benchmark while keeping computation comparable. The proof brings a new perspective on randomized exploration in linear bandits by reducing the analysis to a time-uniform exceedance problem for independent Brownian motions. Intriguingly, this continuous-time lens is not forced; it appears natural--and perhaps necessary: the discrete-time problem seems to be asking for a continuous-time solution, and we know of no other way to obtain a sharp ES bound.
Paper Structure (31 sections, 22 theorems, 126 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 31 sections, 22 theorems, 126 equations, 1 figure, 1 table, 1 algorithm.

Key Result

lemma 3.1

lem:conf_ellip With probability at least $1-\delta$, $\theta_\star\in \bigcap_{t\in \mathbf{N}_+} \Theta_{t-1}^\delta$, where

Figures (1)

  • Figure 1: Schematic geometry for $\mathcal{X} = \mathbf{B}^d_2$: the confidence ellipsoid (solid) is off-center and intersects level sets of the optimal reward $J(\theta) = \max_{x\in \mathcal{X}} \langle x,\theta\rangle = \|\theta\|$ (dashed). For illustration purposes, a sample of $m$ perturbed model parameters is drawn from a Gaussian with covariance inflated by about $1.2\times$ relative to the ellipsoid axes; those whose optimal reward exceeds that of the true parameter are highlighted as optimistic (red coloured dots). The master regret bound states that if we can keep the exceedance frequency of optimistic samples above a constant, then the regret will be under control. The difficulty with ensemble sampling is that, unlike in Thompson sampling, the model parameters are not freshly drawn in each time step but evolve in a correlated fashion over time. As such, controlling the exceedance frequencies is more challenging, and this is the main technical contribution of the paper.

Theorems & Definitions (23)

  • lemma 3.1: Confidence ellipsoids, abbasi2011improved
  • theorem 4.1: restate = theoremLinBound, name = Regret bound for alg:es-lin, linear ensemble sampling
  • theorem 4.2: name=,restate=lowerBoundThm
  • lemma 5.1: name=Self-normalized exceedance frequencies control regret,restate=exceedanceLemma
  • lemma 5.2: restate=fromFixedToUniform, name=From fixed to uniform direction
  • theorem 5.3: restate = theoremDiagGaussEmbedding, name = Diagonal Gaussian embedding
  • theorem 5.4: restate = theoremExcBrownian, name = Time-uniform exceedance count for Brownian motions
  • proposition 5.5: Fixed-direction exceedance control premise
  • corollary 5.6: restate = corUniformExceedance, name = Uniform exceedance control for ES
  • theorem A.1: Master regret bound, janz2023ensemble
  • ...and 13 more