When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses

Antoine Moulin; Emmanuel Esposito; Dirk van der Hoeven

When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses

Antoine Moulin, Emmanuel Esposito, Dirk van der Hoeven

TL;DR

This work addresses prediction with expert advice under heavy-tailed losses by assuming only a bounded second moment $\theta$, without prior knowledge of the loss range. It reveals that traditional loss-range adaptive guarantees can be dominated by lower-order terms and introduces loss-range adaptive, parameter-free algorithms using clipping and a multi-scale entropic regularizer. The authors establish sublinear worst-case regret for Hedge and squared-loss settings, deriving $R_T = \mathcal{O}(\sqrt{\theta T \log(K)})$ with improvements under self-bounded environments and $R_T = \mathcal{O}((Y^2 + \sigma) \log(KT))$ for squared losses. They also provide two concrete algorithmic instantiations (LoOT-Free OMD and LoOT-Free FTRL) and a squared-loss variant, achieving best-of-both-worlds guarantees without requiring knowledge of $\theta$, supported by theoretical analyses and experimental evidence that traditional methods struggle with heavy tails and unbounded losses.

Abstract

We consider the problem setting of prediction with expert advice with possibly heavy-tailed losses, i.e.\ the only assumption on the losses is an upper bound on their second moments, denoted by $θ$. We develop adaptive algorithms that do not require any prior knowledge about the range or the second moment of the losses. Existing adaptive algorithms have what is typically considered a lower-order term in their regret guarantees. We show that this lower-order term, which is often the maximum of the losses, can actually dominate the regret bound in our setting. Specifically, we show that even with small constant $θ$, this lower-order term can scale as $\sqrt{KT}$, where $K$ is the number of experts and $T$ is the time horizon. We propose adaptive algorithms with improved regret bounds that avoid the dependence on such a lower-order term and guarantee $\mathcal{O}(\sqrt{θT\log(K)})$ regret in the worst case, and $\mathcal{O}(θ\log(KT)/Δ_{\min})$ regret when the losses are sampled i.i.d.\ from some fixed distribution, where $Δ_{\min}$ is the difference between the mean losses of the second best expert and the best expert. Additionally, when the loss function is the squared loss, our algorithm also guarantees improved regret bounds over prior results.

When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses

TL;DR

This work addresses prediction with expert advice under heavy-tailed losses by assuming only a bounded second moment

, without prior knowledge of the loss range. It reveals that traditional loss-range adaptive guarantees can be dominated by lower-order terms and introduces loss-range adaptive, parameter-free algorithms using clipping and a multi-scale entropic regularizer. The authors establish sublinear worst-case regret for Hedge and squared-loss settings, deriving

with improvements under self-bounded environments and

for squared losses. They also provide two concrete algorithmic instantiations (LoOT-Free OMD and LoOT-Free FTRL) and a squared-loss variant, achieving best-of-both-worlds guarantees without requiring knowledge of

, supported by theoretical analyses and experimental evidence that traditional methods struggle with heavy tails and unbounded losses.

Abstract

We consider the problem setting of prediction with expert advice with possibly heavy-tailed losses, i.e.\ the only assumption on the losses is an upper bound on their second moments, denoted by

. We develop adaptive algorithms that do not require any prior knowledge about the range or the second moment of the losses. Existing adaptive algorithms have what is typically considered a lower-order term in their regret guarantees. We show that this lower-order term, which is often the maximum of the losses, can actually dominate the regret bound in our setting. Specifically, we show that even with small constant

, this lower-order term can scale as

, where

is the number of experts and

is the time horizon. We propose adaptive algorithms with improved regret bounds that avoid the dependence on such a lower-order term and guarantee

regret in the worst case, and

regret when the losses are sampled i.i.d.\ from some fixed distribution, where

is the difference between the mean losses of the second best expert and the best expert. Additionally, when the loss function is the squared loss, our algorithm also guarantees improved regret bounds over prior results.

When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses

TL;DR

Abstract

When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (25)