Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits
Nicklas Werge, Yi-Shan Wu, Abdullah Akgül, Melih Kandemir
TL;DR
This work tackles non-stationary linear contextual bandits by reframing the learning problem through Weighted Sequential Bayesian (WSB) inference, maintaining a tractable posterior over time-varying parameters $\theta_t$. A novel concentration inequality decomposes error into drift, noise, and a diminishing prior term, enabling principled uncertainty quantification and guiding three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. The results show that these Bayesian methods achieve regret guarantees that match or improve upon WRLS-based approaches while preserving $O(d^2)$ per-round computation, making them attractive for online, long-horizon tasks. Empirical evaluations on synthetic non-stationary scenarios corroborate the theoretical findings, highlighting the practical impact of incorporating prior information and Bayesian updates in dynamic environments.
Abstract
We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms.
