Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

Zhiyong Wang; Jize Xie; Yi Chen; John C. S. Lui; Dongruo Zhou

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

Zhiyong Wang, Jize Xie, Yi Chen, John C. S. Lui, Dongruo Zhou

TL;DR

Two novel algorithms are introduced that address cases where the variance information of the rewards is known and unknown, respectively and outperform previous state-of-the-art results on non-stationary stochastic linear bandits under different settings.

Abstract

We investigate the non-stationary stochastic linear bandit problem where the reward distribution evolves each round. Existing algorithms characterize the non-stationarity by the total variation budget $B_K$, which is the summation of the change of the consecutive feature vectors of the linear bandits over $K$ rounds. However, such a quantity only measures the non-stationarity with respect to the expectation of the reward distribution, which makes existing algorithms sub-optimal under the general non-stationary distribution setting. In this work, we propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. Specifically, we introduce two novel algorithms: Restarted Weighted$\text{OFUL}^+$ and Restarted $\text{SAVE}^+$. These algorithms address cases where the variance information of the rewards is known and unknown, respectively. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary stochastic linear bandits under different settings. Experimental evaluations further validate the superior performance of our proposed algorithms over existing works.

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

TL;DR

Abstract

, which is the summation of the change of the consecutive feature vectors of the linear bandits over

rounds. However, such a quantity only measures the non-stationarity with respect to the expectation of the reward distribution, which makes existing algorithms sub-optimal under the general non-stationary distribution setting. In this work, we propose algorithms that utilize the variance of the reward distribution as well as the

, and show that they can achieve tighter regret upper bounds. Specifically, we introduce two novel algorithms: Restarted Weighted

and Restarted

. These algorithms address cases where the variance information of the rewards is known and unknown, respectively. Notably, when the total variance

is much smaller than

, our algorithms outperform previous state-of-the-art results on non-stationary stochastic linear bandits under different settings. Experimental evaluations further validate the superior performance of our proposed algorithms over existing works.

Paper Structure (18 sections, 13 theorems, 84 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 18 sections, 13 theorems, 84 equations, 1 figure, 1 table, 1 algorithm.

Introduction
Notation
Related Work
Non-stationary (Linear) Bandits
Linear Bandits with Heteroscedastic Noises
Problem Setting
Non-stationary Linear Contextual Bandit with Known Variance
Non-stationary Linear Contextual Bandit with Unknown Variance and Total Variation Budget
Unknown Per-round Variance, Known $V_K$ and $B_K$
Unknown Per-round Variance, Unknown $V_K$ and $B_K$
Experiments
Conclusion and Future Work
$\text{Restarted SAVE}^+$-BOB
Proof of Lemma \ref{['lemma:key']}
Proof for Theorem \ref{['thm: regret for algo1 final']}
...and 3 more sections

Key Result

Lemma 4.1

Let $0<\delta<1$. Then with probability at least $1-\delta$, for any action $\mathbf{a} \in \mathbb{R}^d$, we have

Figures (1)

Figure 1: The regret of Restarted-$\text{WeightedOFUL}^+$, $\text{Restarted SAVE}^+$, SW-UCB and Modified EXP3.S under different total rounds.

Theorems & Definitions (20)

Lemma 4.1
Theorem 4.2
Remark 4.3
Corollary 4.4
Remark 4.5
Remark 4.6
Remark 4.7
Theorem 5.1
Remark 5.2
Corollary 5.3
...and 10 more

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

TL;DR

Abstract

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (20)