Table of Contents
Fetching ...

Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

Artin Tajdini, Jonathan Scarlett, Kevin Jamieson

TL;DR

This work studies stochastic linear bandits under heavy-tailed rewards with a bounded $(1+\epsilon)$-moment, introducing a robust, geometry-aware estimator and a phased elimination algorithm (MED-PE) that leverages experimental design to minimize a moment-based risk. It achieves improved upper bounds $\tilde{\mathcal{O}}\big(d^{\frac{1+3\epsilon}{2(1+\epsilon)}} T^{\frac{1}{1+\epsilon}}\big)$ and lower bounds $\Omega\big(d^{\frac{2\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}}\big)$, demonstrating a dimension-dependent improvement over prior work and establishing a tighter separation from multi-armed bandits. The paper also provides finite-action refinements, action-set dependent geometries (e.g., $l_p$ balls with $p\le 1+\epsilon$), and kernelized (Matérn) kernel results, where the kernel trick yields sublinear regret for all $\epsilon\in(0,1]$. These contributions collectively advance understanding of heavy-tailed noise in linear bandits and offer practical, geometry-aware methods with broad applicability including kernelized settings.

Abstract

We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite $(1+ε)$-absolute central moment bounded by $\upsilon$ for some $ε\in (0,1]$. We improve both upper and lower bounds on the minimax regret compared to prior work. When $\upsilon = \mathcal{O}(1)$, the best prior known regret upper bound is $\tilde{\mathcal{O}}(d T^{\frac{1}{1+ε}})$. While a lower with the same scaling has been given, it relies on a construction using $\upsilon = \mathcal{O}(d)$, and adapting the construction to the bounded-moment regime with $\upsilon = \mathcal{O}(1)$ yields only a $Ω(d^{\fracε{1+ε}} T^{\frac{1}{1+ε}})$ lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being $\sqrt{d}$ below the optimal rate in the finite-variance case ($ε= 1$). We propose a new elimination-based algorithm guided by experimental design, which achieves regret $\tilde{\mathcal{O}}(d^{\frac{1+3ε}{2(1+ε)}} T^{\frac{1}{1+ε}})$, thus improving the dependence on $d$ for all $ε\in (0,1)$ and recovering a known optimal result for $ε= 1$. We also establish a lower bound of $Ω(d^{\frac{2ε}{1+ε}} T^{\frac{1}{1+ε}})$, which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems. For finite action sets, we derive similarly improved upper and lower bounds for regret. Finally, we provide action set dependent regret upper bounds showing that for some geometries, such as $l_p$-norm balls for $p \le 1 + ε$, we can further reduce the dependence on $d$, and we can handle infinite-dimensional settings via the kernel trick, in particular establishing new regret bounds for the Matérn kernel that are the first to be sublinear for all $ε\in (0, 1]$.

Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

TL;DR

This work studies stochastic linear bandits under heavy-tailed rewards with a bounded -moment, introducing a robust, geometry-aware estimator and a phased elimination algorithm (MED-PE) that leverages experimental design to minimize a moment-based risk. It achieves improved upper bounds and lower bounds , demonstrating a dimension-dependent improvement over prior work and establishing a tighter separation from multi-armed bandits. The paper also provides finite-action refinements, action-set dependent geometries (e.g., balls with ), and kernelized (Matérn) kernel results, where the kernel trick yields sublinear regret for all . These contributions collectively advance understanding of heavy-tailed noise in linear bandits and offer practical, geometry-aware methods with broad applicability including kernelized settings.

Abstract

We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite -absolute central moment bounded by for some . We improve both upper and lower bounds on the minimax regret compared to prior work. When , the best prior known regret upper bound is . While a lower with the same scaling has been given, it relies on a construction using , and adapting the construction to the bounded-moment regime with yields only a lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being below the optimal rate in the finite-variance case (). We propose a new elimination-based algorithm guided by experimental design, which achieves regret , thus improving the dependence on for all and recovering a known optimal result for . We also establish a lower bound of , which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems. For finite action sets, we derive similarly improved upper and lower bounds for regret. Finally, we provide action set dependent regret upper bounds showing that for some geometries, such as -norm balls for , we can further reduce the dependence on , and we can handle infinite-dimensional settings via the kernel trick, in particular establishing new regret bounds for the Matérn kernel that are the first to be sublinear for all .

Paper Structure

This paper contains 25 sections, 12 theorems, 57 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Fix the action set $\mathcal{A} = \{x \in [0, 1]^{2d} \,:\, x_{2i - 1} + x_{2i} = 1 \quad\forall i\in [d] \}$. There exists a reward distribution with a $(1 + \epsilon)$-central moment bounded by $1$ and a $\theta^* \in \mathbb{R}^{2d}$ with $\|\theta^*\|_2 \le 1$ and $\sup_{x \in {\mathcal{A}}} |x^

Figures (3)

  • Figure 1: (\ref{['fig:regret_comparison']}) Comparison of regret bounds across $\epsilon$ for $T = d^4$. (\ref{['fig:dimension_dependence']}) Scaling of the bounds in $d$.
  • Figure 2: Comparison of our regret upper bound (solid) and the lower bound of Cho19 (dashed). We plot the exponent $c$ such that the regret bound has dependence $T^{c}$, with the 4 pairs of curves corresponding to $\nu/d \in \{0.25,1,4\}$ and $\nu/d \to \infty$.
  • Figure 3: Regret vs dimension $d$ with time horizon $T = 100,000$, and $N = 2d$ arms

Theorems & Definitions (15)

  • Theorem 1
  • proof
  • Theorem 2
  • Lemma 1
  • Theorem 3
  • Remark 1
  • Lemma 2
  • proof
  • Corollary 1
  • Corollary 2
  • ...and 5 more