Statistical Inference under Adaptive Sampling with LinUCB
Wei Fan, Kevin Tan, Yuting Wei
TL;DR
This work develops statistical inference under adaptive data collection for stochastic linear bandits by analyzing LinUCB with a unit-ball action set. It proves an asymptotic normality result for the LinUCB estimator projected onto the tangent space (θ*)⊥, with a T^{-1/4} convergence rate, enabling Wald-type confidence sets that are tighter and distribution-based than prior nonasymptotic bounds. A key technical contribution is a precise, nonasymptotic characterization of the design covariance Λ_T, showing a rank-one signal along θ* and an almost isotropic bulk that grows at rate √T, and a four-phase analysis of its evolution. The results yield an effective sample size about √T, justify inference under adaptivity, and are supported by simulations illustrating the normality and tightening of confidence sets in adaptive linear bandit settings.
Abstract
Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistical inference tools inapplicable. This can be mitigated by a property called stability, which states that if the rate at which an algorithm takes actions converges to a deterministic limit, one can expect that certain parameters are asymptotically normal. Building on a recent line of work for the multi-armed bandit setting, we show that the linear upper confidence bound (LinUCB) algorithm for linear bandits satisfies this property. In doing so, we painstakingly characterize the behavior of the eigenvalues and eigenvectors of the random design feature covariance matrix in the setting where the action set is the unit ball, showing that it decomposes into a rank-one direction that locks onto the true parameter and an almost-isotropic bulk that grows at a predictable $\sqrt{T}$ rate. This allows us to establish a central limit theorem for the LinUCB algorithm, establishing asymptotic normality for the limiting distribution of the estimation error where the convergence occurs at a $T^{-1/4}$ rate. The resulting Wald-type confidence sets and hypothesis tests do not depend on the feature covariance matrix and are asymptotically tighter than existing nonasymptotic confidence sets. Numerical simulations corroborate our findings.
