Table of Contents
Fetching ...

Statistical Inference under Adaptive Sampling with LinUCB

Wei Fan, Kevin Tan, Yuting Wei

TL;DR

This work develops statistical inference under adaptive data collection for stochastic linear bandits by analyzing LinUCB with a unit-ball action set. It proves an asymptotic normality result for the LinUCB estimator projected onto the tangent space (θ*)⊥, with a T^{-1/4} convergence rate, enabling Wald-type confidence sets that are tighter and distribution-based than prior nonasymptotic bounds. A key technical contribution is a precise, nonasymptotic characterization of the design covariance Λ_T, showing a rank-one signal along θ* and an almost isotropic bulk that grows at rate √T, and a four-phase analysis of its evolution. The results yield an effective sample size about √T, justify inference under adaptivity, and are supported by simulations illustrating the normality and tightening of confidence sets in adaptive linear bandit settings.

Abstract

Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistical inference tools inapplicable. This can be mitigated by a property called stability, which states that if the rate at which an algorithm takes actions converges to a deterministic limit, one can expect that certain parameters are asymptotically normal. Building on a recent line of work for the multi-armed bandit setting, we show that the linear upper confidence bound (LinUCB) algorithm for linear bandits satisfies this property. In doing so, we painstakingly characterize the behavior of the eigenvalues and eigenvectors of the random design feature covariance matrix in the setting where the action set is the unit ball, showing that it decomposes into a rank-one direction that locks onto the true parameter and an almost-isotropic bulk that grows at a predictable $\sqrt{T}$ rate. This allows us to establish a central limit theorem for the LinUCB algorithm, establishing asymptotic normality for the limiting distribution of the estimation error where the convergence occurs at a $T^{-1/4}$ rate. The resulting Wald-type confidence sets and hypothesis tests do not depend on the feature covariance matrix and are asymptotically tighter than existing nonasymptotic confidence sets. Numerical simulations corroborate our findings.

Statistical Inference under Adaptive Sampling with LinUCB

TL;DR

This work develops statistical inference under adaptive data collection for stochastic linear bandits by analyzing LinUCB with a unit-ball action set. It proves an asymptotic normality result for the LinUCB estimator projected onto the tangent space (θ*)⊥, with a T^{-1/4} convergence rate, enabling Wald-type confidence sets that are tighter and distribution-based than prior nonasymptotic bounds. A key technical contribution is a precise, nonasymptotic characterization of the design covariance Λ_T, showing a rank-one signal along θ* and an almost isotropic bulk that grows at rate √T, and a four-phase analysis of its evolution. The results yield an effective sample size about √T, justify inference under adaptivity, and are supported by simulations illustrating the normality and tightening of confidence sets in adaptive linear bandit settings.

Abstract

Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistical inference tools inapplicable. This can be mitigated by a property called stability, which states that if the rate at which an algorithm takes actions converges to a deterministic limit, one can expect that certain parameters are asymptotically normal. Building on a recent line of work for the multi-armed bandit setting, we show that the linear upper confidence bound (LinUCB) algorithm for linear bandits satisfies this property. In doing so, we painstakingly characterize the behavior of the eigenvalues and eigenvectors of the random design feature covariance matrix in the setting where the action set is the unit ball, showing that it decomposes into a rank-one direction that locks onto the true parameter and an almost-isotropic bulk that grows at a predictable rate. This allows us to establish a central limit theorem for the LinUCB algorithm, establishing asymptotic normality for the limiting distribution of the estimation error where the convergence occurs at a rate. The resulting Wald-type confidence sets and hypothesis tests do not depend on the feature covariance matrix and are asymptotically tighter than existing nonasymptotic confidence sets. Numerical simulations corroborate our findings.

Paper Structure

This paper contains 84 sections, 19 theorems, 421 equations, 6 figures, 1 algorithm.

Key Result

Theorem 1

Under Assumptions aspt:unconstrained--aspt:subgaussian, fix any matrix $\bm U\in\mathbb R^{d\times(d-1)}$ with orthonormal columns orthogonal to $\bm\theta^{\star}$; equivalently, let $\bm{Q} = (\bm{\theta}^{\star}, \bm{U})$, then $\bm{Q}^{\top}\bm{Q} = \bm{I}_d$. With $\beta\gg d^2(\sigma\sqrt{d+\l

Figures (6)

  • Figure 1: Asymptotic normality of the LinUCB algorithm in case where the action set is the unit ball. For some random vector $u$ on the unit ball, we plot $\widehat{\sigma}^{-1} (\frac{2\beta^2 T}{d+1})^{1/4} u^\top ( \widehat{{\bm \theta}}_T - {\bm \theta}^\star)$ over 1000 independent trials, with KDE estimate overlaid as well as Shapiro-Wilk $p$-values provided as a test for non-normality. Asymptotic normality is indeed demonstrated, but the rate of convergence to the true parameter is certainly empirically slower than the $1/\sqrt{T}$ parametric rate, corroborating our theory.
  • Figure 2: Estimation error of the parameter estimate obtained by ridge regression within the LinUCB algorithm. That is, we plot $\lVert \overline{\bm \theta}_t - {\bm \theta}^\star\rVert_2$ against timesteps. In this simulation, the action set is the unit ball and the optimal parameter is the first standard basis vector. We see that the estimation error decreases according to the $T^{-1/4}$ rate as predicted within Theorem \ref{['thm:ucb-unconstrained']}.
  • Figure 3: Growth of $\lambda_{t,d}$ and $\overline{\lambda}_t$. Throughout the entire process, these two quantities grow on the same order, falling in a constant-factor band of a deterministic growth benchmark $\lambda_t^{\star}$. When $t\geq t_2$, the minimum eigenvalue $\lambda_{t,d}$ concentrates close to the non-leading mean $\overline{\lambda}_t$, and both $\lambda_{t,d}$ and $\overline{\lambda}_t$ concentrates to a deterministic limit $\lambda_t^{\star}$ when $t=T$.
  • Figure 4: Rate of growth of the eigenvalues of the covariance matrix within a simulation of running LinUCB where the action set is the unit ball and the optimal parameter is the first standard basis vector. We see that the non-leading eigenvalues increase linearly at first as predicted by Proposition \ref{['prop:first-stage']}, before increasing on the order of $\sqrt{t}$ as predicted within Proposition \ref{['prop:second-stage']}. The dashed red line in the bottom row denotes the theoretical rate within Theorem \ref{['thm:ucb-unconstrained']}. Overall, the above simulation aligns well with our theory.
  • Figure 5: Concentration of the top eigenvector from the parameter estimate obtained through ridge regression and the true parameter. This is compared with the refined theoretical bound in Proposition \ref{['prop:third-stage']}.
  • ...and 1 more figures

Theorems & Definitions (20)

  • Definition 1: Stability
  • Theorem 1: Asymptotic normality for LinUCB
  • Corollary 1: Confidence set of LinUCB
  • Theorem 2: Uniform control of estimation error
  • Corollary 2
  • Theorem 3: Eigenstructure concentration of LinUCB
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • ...and 10 more