uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs

Yu Chen; Jiatai Huang; Yan Dai; Longbo Huang

uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs

Yu Chen, Jiatai Huang, Yan Dai, Longbo Huang

TL;DR

The paper tackles heavy-tailed Multi-Armed Bandits in both stochastic and adversarial environments without prior knowledge of heavy-tail parameters, proposing uniINF, a parameter-free BoBW algorithm. It combines Follow-the-Regularized-Leader with a log-barrier regularizer, an auto-balancing learning-rate schedule, and an adaptive skipping-clipping mechanism to handle heavy-tailed losses. Theoretical results show near-optimal regret in adversarial settings, and instance-dependent, near-optimal regret in stochastic settings, with a detailed BoBW regret-decomposition and novel analyses of the Bregman divergence and shifting terms. This work advances robust, parameter-free decision-making under heavy-tailed, time-varying losses, with implications for online learning systems facing unpredictable environments.

Abstract

In this paper, we present a novel algorithm, uniINF, for the Heavy-Tailed Multi-Armed Bandits (HTMAB) problem, demonstrating robustness and adaptability in both stochastic and adversarial environments. Unlike the stochastic MAB setting where loss distributions are stationary with time, our study extends to the adversarial setup, where losses are generated from heavy-tailed distributions that depend on both arms and time. Our novel algorithm `uniINF` enjoys the so-called Best-of-Both-Worlds (BoBW) property, performing optimally in both stochastic and adversarial environments without knowing the exact environment type. Moreover, our algorithm also possesses a Parameter-Free feature, i.e., it operates without the need of knowing the heavy-tail parameters $(σ, α)$ a-priori. To be precise, uniINF ensures nearly-optimal regret in both stochastic and adversarial environments, matching the corresponding lower bounds when $(σ, α)$ is known (up to logarithmic factors). To our knowledge, uniINF is the first parameter-free algorithm to achieve the BoBW property for the heavy-tailed MAB problem. Technically, we develop innovative techniques to achieve BoBW guarantees for Parameter-Free HTMABs, including a refined analysis for the dynamics of log-barrier, an auto-balancing learning rate scheduling scheme, an adaptive skipping-clipping loss tuning technique, and a stopping-time analysis for logarithmic regret.

uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs

TL;DR

Abstract

a-priori. To be precise, uniINF ensures nearly-optimal regret in both stochastic and adversarial environments, matching the corresponding lower bounds when

is known (up to logarithmic factors). To our knowledge, uniINF is the first parameter-free algorithm to achieve the BoBW property for the heavy-tailed MAB problem. Technically, we develop innovative techniques to achieve BoBW guarantees for Parameter-Free HTMABs, including a refined analysis for the dynamics of log-barrier, an auto-balancing learning rate scheduling scheme, an adaptive skipping-clipping loss tuning technique, and a stopping-time analysis for logarithmic regret.

Paper Structure (35 sections, 27 theorems, 174 equations, 2 tables, 1 algorithm)

This paper contains 35 sections, 27 theorems, 174 equations, 2 tables, 1 algorithm.

Introduction
Related Work
Preliminaries: Heavy-Tailed Multi-Armed Bandits
The BoBW HTMAB Algorithm uniINF
Refined Log-Barrier Analysis
Auto-Balancing Learning Rate Scheduling Scheme
Adaptive Skipping-Clipping Loss Tuning Technique
Main Results
Regret Decomposition
Analyzing Bregman Divergence Terms
Analyzing $\Psi$-Shifting Terms
Analyzing Sub-Optimal Skipping Losses
Conclusion
Additional Related Works
Multi-Armed Bandits
...and 20 more sections

Key Result

Theorem 3

Under the adversarial environments, uniINF (alg) achieves Moreover, for the stochastic environments, uniINF (alg) guarantees

Theorems & Definitions (43)

Theorem 3: Main Guarantee
Lemma 4: ${\bm{z}}_t$ is Multiplicatively Close to ${\bm{x}}_t$
Lemma 5: Upper Bound of $\textsc{Div}_t$
Lemma 6: Upper Bound for $S_{T+1}$
Theorem 7: Adversarial Bounds for Bregman Divergence Terms
Theorem 8: Stochastic Bounds for Bregman Divergence Terms
Lemma 9: Upper Bound of $\textsc{Shift}_t$
Theorem 10: Adversarial Bounds for $\Psi$-Shifting Terms
Theorem 11: Stochastic Bounds for $\Psi$-shifting Terms
Lemma 12: Stopping-Time Argument for Skipping Losses
...and 33 more

uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs

TL;DR

Abstract

uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (43)