Table of Contents
Fetching ...

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

Zhiyong Wang, Jize Xie, Yi Chen, John C. S. Lui, Dongruo Zhou

TL;DR

Two novel algorithms are introduced that address cases where the variance information of the rewards is known and unknown, respectively and outperform previous state-of-the-art results on non-stationary stochastic linear bandits under different settings.

Abstract

We investigate the non-stationary stochastic linear bandit problem where the reward distribution evolves each round. Existing algorithms characterize the non-stationarity by the total variation budget $B_K$, which is the summation of the change of the consecutive feature vectors of the linear bandits over $K$ rounds. However, such a quantity only measures the non-stationarity with respect to the expectation of the reward distribution, which makes existing algorithms sub-optimal under the general non-stationary distribution setting. In this work, we propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. Specifically, we introduce two novel algorithms: Restarted Weighted$\text{OFUL}^+$ and Restarted $\text{SAVE}^+$. These algorithms address cases where the variance information of the rewards is known and unknown, respectively. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary stochastic linear bandits under different settings. Experimental evaluations further validate the superior performance of our proposed algorithms over existing works.

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

TL;DR

Two novel algorithms are introduced that address cases where the variance information of the rewards is known and unknown, respectively and outperform previous state-of-the-art results on non-stationary stochastic linear bandits under different settings.

Abstract

We investigate the non-stationary stochastic linear bandit problem where the reward distribution evolves each round. Existing algorithms characterize the non-stationarity by the total variation budget , which is the summation of the change of the consecutive feature vectors of the linear bandits over rounds. However, such a quantity only measures the non-stationarity with respect to the expectation of the reward distribution, which makes existing algorithms sub-optimal under the general non-stationary distribution setting. In this work, we propose algorithms that utilize the variance of the reward distribution as well as the , and show that they can achieve tighter regret upper bounds. Specifically, we introduce two novel algorithms: Restarted Weighted and Restarted . These algorithms address cases where the variance information of the rewards is known and unknown, respectively. Notably, when the total variance is much smaller than , our algorithms outperform previous state-of-the-art results on non-stationary stochastic linear bandits under different settings. Experimental evaluations further validate the superior performance of our proposed algorithms over existing works.
Paper Structure (18 sections, 13 theorems, 84 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 18 sections, 13 theorems, 84 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Lemma 4.1

Let $0<\delta<1$. Then with probability at least $1-\delta$, for any action $\mathbf{a} \in \mathbb{R}^d$, we have

Figures (1)

  • Figure 1: The regret of Restarted-$\text{WeightedOFUL}^+$, $\text{Restarted SAVE}^+$, SW-UCB and Modified EXP3.S under different total rounds.

Theorems & Definitions (20)

  • Lemma 4.1
  • Theorem 4.2
  • Remark 4.3
  • Corollary 4.4
  • Remark 4.5
  • Remark 4.6
  • Remark 4.7
  • Theorem 5.1
  • Remark 5.2
  • Corollary 5.3
  • ...and 10 more