Table of Contents
Fetching ...

Variance-Dependent Regret Lower Bounds for Contextual Bandits

Jiafan He, Quanquan Gu

TL;DR

This work establishes variance-dependent regret lower bounds for linear contextual bandits in two key settings: a prefixed variance sequence and an adaptive variance sequence with a weak adversary. It introduces intricate constructions and proof techniques (peeling, multi-instance, and high-probability boosting) to show that, under prefixed sequences, the expected regret is at least $\Omega(d\sqrt{\sum_{k=1}^K \sigma_k^2}/\log K)$, aligning with variance-aware upper bounds up to logarithms. For adaptive sequences, a high-probability lower bound of $\Omega(d\sqrt{\sum_{k=1}^K \sigma_k^2}/\log^6(dK))$ holds under a weak adversary, while a strong-adversary model demonstrates that no general variance-dependent lower bound can exist. The results collectively bridge the gap between lower and upper bounds in the variance-aware linear bandit setting and delineate fundamental limitations when adversaries can tailor variance after observing the decision sets. The findings have implications for designing variance-aware algorithms (like SAVE) and for understanding the inherent difficulty of learning under heteroscedastic rewards.

Abstract

Variance-dependent regret bounds for linear contextual bandits, which improve upon the classical $\tilde{O}(d\sqrt{K})$ regret bound to $\tilde{O}(d\sqrt{\sum_{k=1}^Kσ_k^2})$, where $d$ is the context dimension, $K$ is the number of rounds, and $σ^2_k$ is the noise variance in round $k$, has been widely studied in recent years. However, most existing works focus on the regret upper bounds instead of lower bounds. To our knowledge, the only lower bound is from Jia et al. (2024), which proved that for any eluder dimension $d_{\textbf{elu}}$ and total variance budget $Λ$, there exists an instance with $\sum_{k=1}^Kσ_k^2\leq Λ$ for which any algorithm incurs a variance-dependent lower bound of $Ω(\sqrt{d_{\textbf{elu}}Λ})$. However, this lower bound has a $\sqrt{d}$ gap with existing upper bounds. Moreover, it only considers a fixed total variance budget $Λ$ and does not apply to a general variance sequence $\{σ_1^2,\ldots,σ_K^2\}$. In this paper, to overcome the limitations of Jia et al. (2024), we consider the general variance sequence under two settings. For a prefixed sequence, where the entire variance sequence is revealed to the learner at the beginning of the learning process, we establish a variance-dependent lower bound of $Ω(d \sqrt{\sum_{k=1}^Kσ_k^2 }/\log K)$ for linear contextual bandits. For an adaptive sequence, where an adversary can generate the variance $σ_k^2$ in each round $k$ based on historical observations, we show that when the adversary must generate $σ_k^2$ before observing the decision set $\mathcal{D}_k$, a similar lower bound of $Ω(d\sqrt{ \sum_{k=1}^Kσ_k^2} /\log^6(dK))$ holds. In both settings, our results match the upper bounds of the SAVE algorithm (Zhao et al., 2023) up to logarithmic factors.

Variance-Dependent Regret Lower Bounds for Contextual Bandits

TL;DR

This work establishes variance-dependent regret lower bounds for linear contextual bandits in two key settings: a prefixed variance sequence and an adaptive variance sequence with a weak adversary. It introduces intricate constructions and proof techniques (peeling, multi-instance, and high-probability boosting) to show that, under prefixed sequences, the expected regret is at least , aligning with variance-aware upper bounds up to logarithms. For adaptive sequences, a high-probability lower bound of holds under a weak adversary, while a strong-adversary model demonstrates that no general variance-dependent lower bound can exist. The results collectively bridge the gap between lower and upper bounds in the variance-aware linear bandit setting and delineate fundamental limitations when adversaries can tailor variance after observing the decision sets. The findings have implications for designing variance-aware algorithms (like SAVE) and for understanding the inherent difficulty of learning under heteroscedastic rewards.

Abstract

Variance-dependent regret bounds for linear contextual bandits, which improve upon the classical regret bound to , where is the context dimension, is the number of rounds, and is the noise variance in round , has been widely studied in recent years. However, most existing works focus on the regret upper bounds instead of lower bounds. To our knowledge, the only lower bound is from Jia et al. (2024), which proved that for any eluder dimension and total variance budget , there exists an instance with for which any algorithm incurs a variance-dependent lower bound of . However, this lower bound has a gap with existing upper bounds. Moreover, it only considers a fixed total variance budget and does not apply to a general variance sequence . In this paper, to overcome the limitations of Jia et al. (2024), we consider the general variance sequence under two settings. For a prefixed sequence, where the entire variance sequence is revealed to the learner at the beginning of the learning process, we establish a variance-dependent lower bound of for linear contextual bandits. For an adaptive sequence, where an adversary can generate the variance in each round based on historical observations, we show that when the adversary must generate before observing the decision set , a similar lower bound of holds. In both settings, our results match the upper bounds of the SAVE algorithm (Zhao et al., 2023) up to logarithmic factors.

Paper Structure

This paper contains 22 sections, 13 theorems, 35 equations.

Key Result

Theorem 1.1

For any linear contextual bandit problem, the regret of the SAVE algorithm in the first $K$ rounds is upper bounded by: where $d$ is the dimension and $\sigma^2_{k}$ is the noise variance of the selected action in round $k$.

Theorems & Definitions (21)

  • Theorem 1.1: Theorem 2.3, zhao2023variance
  • Theorem 1.2: Theorem 5.1, jia2024does
  • Remark 3.1
  • Theorem 4.1
  • Remark 4.2
  • Lemma 4.3
  • Remark 4.4
  • Remark 4.5: Linear Contextual Bandits vs. Stochastic Linear Bandits
  • Remark 5.1
  • Theorem 5.2: Weak Adversary
  • ...and 11 more