Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits

Nicklas Werge; Yi-Shan Wu; Abdullah Akgül; Melih Kandemir

Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits

Nicklas Werge, Yi-Shan Wu, Abdullah Akgül, Melih Kandemir

TL;DR

This work tackles non-stationary linear contextual bandits by reframing the learning problem through Weighted Sequential Bayesian (WSB) inference, maintaining a tractable posterior over time-varying parameters $\theta_t$. A novel concentration inequality decomposes error into drift, noise, and a diminishing prior term, enabling principled uncertainty quantification and guiding three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. The results show that these Bayesian methods achieve regret guarantees that match or improve upon WRLS-based approaches while preserving $O(d^2)$ per-round computation, making them attractive for online, long-horizon tasks. Empirical evaluations on synthetic non-stationary scenarios corroborate the theoretical findings, highlighting the practical impact of incorporating prior information and Bayesian updates in dynamic environments.

Abstract

We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms.

Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits

TL;DR

. A novel concentration inequality decomposes error into drift, noise, and a diminishing prior term, enabling principled uncertainty quantification and guiding three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. The results show that these Bayesian methods achieve regret guarantees that match or improve upon WRLS-based approaches while preserving

per-round computation, making them attractive for online, long-horizon tasks. Empirical evaluations on synthetic non-stationary scenarios corroborate the theoretical findings, highlighting the practical impact of incorporating prior information and Bayesian updates in dynamic environments.

Abstract

Paper Structure (41 sections, 19 theorems, 95 equations, 1 figure, 1 table, 3 algorithms)

This paper contains 41 sections, 19 theorems, 95 equations, 1 figure, 1 table, 3 algorithms.

INTRODUCTION
This work.
Contributions.
Notations.
BACKGROUND
Problem Formulation
Weighted Regularized Least-Squares
WRLS estimator.
Error decomposition.
Concentration bounds for WRLS.
Deterministic Exploration with WRLS
Regret guarantees.
Randomized Exploration with WRLS
Randomized UCB.
Thompson Sampling.
...and 26 more sections

Key Result

Lemma 1

For any $\delta \in (0,1)$, with probability at least $1-\delta$, the following inequalities holds for all $t \in \mathbb{N}_{+}$: and $\forall x \in \mathcal{X}_{t}, \; \lvert \langle x , \hat{\theta}_{t-1} - \theta_{t}^{*} \rangle \rvert \leq (\alpha_{t-1}^{\mathop{\mathrm{WRLS}}\nolimits} + \beta_{t-1}^{\mathop{\mathrm{WRLS}}\nolimits}(\delta)) \lVert x \rVert_{V_{t-1}^{-1}}$, where

Figures (1)

Figure 1: Regret comparison of algorithms in the abruptly changing scenario (left figure) and the slowly varying scenario (right figure), averaged over $100$ different trials. Vertical red dashed lines mark the change-points in the abruptly changing scenario (unknown to the algorithms).

Theorems & Definitions (37)

Lemma 1: wang2023revisit
Lemma 2
Lemma 3
Lemma 4
Lemma 5
Theorem 1
Corollary 1
Theorem 2
Corollary 2
Corollary 3
...and 27 more

Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits

TL;DR

Abstract

Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (37)