$(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits

Gianmarco Genalti; Lupo Marsigli; Nicola Gatti; Alberto Maria Metelli

$(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits

Gianmarco Genalti, Lupo Marsigli, Nicola Gatti, Alberto Maria Metelli

TL;DR

This work tackles heavy-tailed stochastic bandits with unknown moment order $\epsilon$ and bound $u$, extending beyond the non-adaptive setting by studying adaptivity costs. It introduces a fully data-driven trimmed-mean estimator with an empirical threshold (via root finding) and a new AdaR-UCB algorithm that leverages this estimator to adapt to unknown $\epsilon$ and $u$ under a truncated non-positivity assumption. The authors prove two negative results showing adaptivity cannot preserve non-adaptive regret without extra assumptions, and they derive a minimax lower bound under the assumption that does not vanish, thereby justifying the need for assumptions like truncated non-positivity. Their AdaR-UCB achieves regret close to the non-adaptive lower bound in the worst case and nearly matching instance-dependent lower bounds in the adaptive setting, marking a first in achieving near-optimal guarantees under unknown HT moments and mild distributional assumptions with a fully data-driven approach.

Abstract

Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order $1+ε$, uniformly bounded by a constant $u<+\infty$, for some $ε\in (0,1]$. In this setting, we study the regret minimization problem when $ε$ and $u$ are unknown to the learner and it has to adapt. First, we show that adaptation comes at a cost and derive two negative results proving that the same regret guarantees of the non-adaptive case cannot be achieved with no further assumptions. Then, we devise and analyze a fully data-driven trimmed mean estimator and propose a novel adaptive regret minimization algorithm, AdaR-UCB, that leverages such an estimator. Finally, we show that AdaR-UCB is the first algorithm that, under a known distributional assumption, enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.

$(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits

TL;DR

This work tackles heavy-tailed stochastic bandits with unknown moment order

and bound

, extending beyond the non-adaptive setting by studying adaptivity costs. It introduces a fully data-driven trimmed-mean estimator with an empirical threshold (via root finding) and a new AdaR-UCB algorithm that leverages this estimator to adapt to unknown

and

under a truncated non-positivity assumption. The authors prove two negative results showing adaptivity cannot preserve non-adaptive regret without extra assumptions, and they derive a minimax lower bound under the assumption that does not vanish, thereby justifying the need for assumptions like truncated non-positivity. Their AdaR-UCB achieves regret close to the non-adaptive lower bound in the worst case and nearly matching instance-dependent lower bounds in the adaptive setting, marking a first in achieving near-optimal guarantees under unknown HT moments and mild distributional assumptions with a fully data-driven approach.

Abstract

, uniformly bounded by a constant

, for some

. In this setting, we study the regret minimization problem when

and

are unknown to the learner and it has to adapt. First, we show that adaptation comes at a cost and derive two negative results proving that the same regret guarantees of the non-adaptive case cannot be achieved with no further assumptions. Then, we devise and analyze a fully data-driven trimmed mean estimator and propose a novel adaptive regret minimization algorithm, AdaR-UCB, that leverages such an estimator. Finally, we show that AdaR-UCB is the first algorithm that, under a known distributional assumption, enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.

Paper Structure (25 sections, 19 theorems, 106 equations, 1 table)

This paper contains 25 sections, 19 theorems, 106 equations, 1 table.

Introduction
Related Works
Minimax Lower Bounds for Adaptive Heavy-Tailed Bandits
Negative Results about Adaptivity
Minimax Lower Bound under Assumption \ref{['ass:truncated_np']}
Trimmed Mean Estimator with Empirical Threshold
An $(\epsilon,u)$-Adaptive Approach for Heavy-Tailed Bandits
The Algorithm
Regret Analysis
Conclusions
Additional Related Works
Adaptivity via Lepskii Method
Adaptivity in Subgaussian Bandits
Proofs and Derivations
Lower Bounds
...and 10 more sections

Key Result

Theorem 1

Fix $\epsilon \in (0,1]$ and $u \ge 0$. For every algorithm Alg, sufficiently large learning horizon $T \in \mathbb{N}$, and number of arms $K \in \mathbb{N}_{\ge 2}$, it holds that: where $c_0>0$ is a constant independent of $u$, $\epsilon$, $K$, and $T$.

Theorems & Definitions (19)

Theorem 1: Minimax lower bound -- non-adaptive, bubeck2013bandits
Theorem 2: Minimax lower bound -- $u$-adaptive
Theorem 3: Minimax lower bound -- $\epsilon$-adaptive
Theorem 4: Minimax lower bound under Assumption \ref{['ass:truncated_np']} - non-adaptive
lemma 1: $(\epsilon,u)$-free Upper Confidence Bound
Theorem 5: Bounds on $\widehat {M}_{s}(\delta)$
Theorem 6: $(\epsilon,u)$-dependent Concentration Bound
Theorem 7: Instance-Dependent Regret bound of
Theorem 8: Worst-Case Regret bound of
Theorem 8: Minimax lower bound -- $u$-adaptive
...and 9 more

$(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits

TL;DR

Abstract

$(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (19)