Table of Contents
Fetching ...

Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits

Nicklas Werge, Yi-Shan Wu, Abdullah Akgül, Melih Kandemir

TL;DR

This work tackles non-stationary linear contextual bandits by reframing the learning problem through Weighted Sequential Bayesian (WSB) inference, maintaining a tractable posterior over time-varying parameters $\theta_t$. A novel concentration inequality decomposes error into drift, noise, and a diminishing prior term, enabling principled uncertainty quantification and guiding three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. The results show that these Bayesian methods achieve regret guarantees that match or improve upon WRLS-based approaches while preserving $O(d^2)$ per-round computation, making them attractive for online, long-horizon tasks. Empirical evaluations on synthetic non-stationary scenarios corroborate the theoretical findings, highlighting the practical impact of incorporating prior information and Bayesian updates in dynamic environments.

Abstract

We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms.

Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits

TL;DR

This work tackles non-stationary linear contextual bandits by reframing the learning problem through Weighted Sequential Bayesian (WSB) inference, maintaining a tractable posterior over time-varying parameters . A novel concentration inequality decomposes error into drift, noise, and a diminishing prior term, enabling principled uncertainty quantification and guiding three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. The results show that these Bayesian methods achieve regret guarantees that match or improve upon WRLS-based approaches while preserving per-round computation, making them attractive for online, long-horizon tasks. Empirical evaluations on synthetic non-stationary scenarios corroborate the theoretical findings, highlighting the practical impact of incorporating prior information and Bayesian updates in dynamic environments.

Abstract

We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms.
Paper Structure (41 sections, 19 theorems, 95 equations, 1 figure, 1 table, 3 algorithms)

This paper contains 41 sections, 19 theorems, 95 equations, 1 figure, 1 table, 3 algorithms.

Key Result

Lemma 1

For any $\delta \in (0,1)$, with probability at least $1-\delta$, the following inequalities holds for all $t \in \mathbb{N}_{+}$: and $\forall x \in \mathcal{X}_{t}, \; \lvert \langle x , \hat{\theta}_{t-1} - \theta_{t}^{*} \rangle \rvert \leq (\alpha_{t-1}^{\mathop{\mathrm{WRLS}}\nolimits} + \beta_{t-1}^{\mathop{\mathrm{WRLS}}\nolimits}(\delta)) \lVert x \rVert_{V_{t-1}^{-1}}$, where

Figures (1)

  • Figure 1: Regret comparison of algorithms in the abruptly changing scenario (left figure) and the slowly varying scenario (right figure), averaged over $100$ different trials. Vertical red dashed lines mark the change-points in the abruptly changing scenario (unknown to the algorithms).

Theorems & Definitions (37)

  • Lemma 1: wang2023revisit
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Corollary 2
  • Corollary 3
  • ...and 27 more