Table of Contents
Fetching ...

When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses

Antoine Moulin, Emmanuel Esposito, Dirk van der Hoeven

TL;DR

This work addresses prediction with expert advice under heavy-tailed losses by assuming only a bounded second moment $\theta$, without prior knowledge of the loss range. It reveals that traditional loss-range adaptive guarantees can be dominated by lower-order terms and introduces loss-range adaptive, parameter-free algorithms using clipping and a multi-scale entropic regularizer. The authors establish sublinear worst-case regret for Hedge and squared-loss settings, deriving $R_T = \mathcal{O}(\sqrt{\theta T \log(K)})$ with improvements under self-bounded environments and $R_T = \mathcal{O}((Y^2 + \sigma) \log(KT))$ for squared losses. They also provide two concrete algorithmic instantiations (LoOT-Free OMD and LoOT-Free FTRL) and a squared-loss variant, achieving best-of-both-worlds guarantees without requiring knowledge of $\theta$, supported by theoretical analyses and experimental evidence that traditional methods struggle with heavy tails and unbounded losses.

Abstract

We consider the problem setting of prediction with expert advice with possibly heavy-tailed losses, i.e.\ the only assumption on the losses is an upper bound on their second moments, denoted by $θ$. We develop adaptive algorithms that do not require any prior knowledge about the range or the second moment of the losses. Existing adaptive algorithms have what is typically considered a lower-order term in their regret guarantees. We show that this lower-order term, which is often the maximum of the losses, can actually dominate the regret bound in our setting. Specifically, we show that even with small constant $θ$, this lower-order term can scale as $\sqrt{KT}$, where $K$ is the number of experts and $T$ is the time horizon. We propose adaptive algorithms with improved regret bounds that avoid the dependence on such a lower-order term and guarantee $\mathcal{O}(\sqrt{θT\log(K)})$ regret in the worst case, and $\mathcal{O}(θ\log(KT)/Δ_{\min})$ regret when the losses are sampled i.i.d.\ from some fixed distribution, where $Δ_{\min}$ is the difference between the mean losses of the second best expert and the best expert. Additionally, when the loss function is the squared loss, our algorithm also guarantees improved regret bounds over prior results.

When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses

TL;DR

This work addresses prediction with expert advice under heavy-tailed losses by assuming only a bounded second moment , without prior knowledge of the loss range. It reveals that traditional loss-range adaptive guarantees can be dominated by lower-order terms and introduces loss-range adaptive, parameter-free algorithms using clipping and a multi-scale entropic regularizer. The authors establish sublinear worst-case regret for Hedge and squared-loss settings, deriving with improvements under self-bounded environments and for squared losses. They also provide two concrete algorithmic instantiations (LoOT-Free OMD and LoOT-Free FTRL) and a squared-loss variant, achieving best-of-both-worlds guarantees without requiring knowledge of , supported by theoretical analyses and experimental evidence that traditional methods struggle with heavy tails and unbounded losses.

Abstract

We consider the problem setting of prediction with expert advice with possibly heavy-tailed losses, i.e.\ the only assumption on the losses is an upper bound on their second moments, denoted by . We develop adaptive algorithms that do not require any prior knowledge about the range or the second moment of the losses. Existing adaptive algorithms have what is typically considered a lower-order term in their regret guarantees. We show that this lower-order term, which is often the maximum of the losses, can actually dominate the regret bound in our setting. Specifically, we show that even with small constant , this lower-order term can scale as , where is the number of experts and is the time horizon. We propose adaptive algorithms with improved regret bounds that avoid the dependence on such a lower-order term and guarantee regret in the worst case, and regret when the losses are sampled i.i.d.\ from some fixed distribution, where is the difference between the mean losses of the second best expert and the best expert. Additionally, when the loss function is the squared loss, our algorithm also guarantees improved regret bounds over prior results.

Paper Structure

This paper contains 34 sections, 13 theorems, 121 equations, 4 figures, 1 table, 3 algorithms.

Key Result

Theorem 2.2

Consider the Hedge setting and suppose asp:finite-second-moment holds. Then, there exists an algorithm that, without prior information on the losses, guarantees that $R_T = \mathcal{O}\bigl(\sqrt{\theta T \log(K)}\bigr)$.

Figures (4)

  • Figure 1: Results of experiments with heavy-tailed losses.
  • Figure 2: Results of experiments with heavy-tailed losses. Dotted lines represent mean $\pm$ one standard deviation.
  • Figure 3: Results of experiments with non-heavy-tailed losses.
  • Figure 4: Results of experiments with non-heavy-tailed losses. Dotted lines represent mean $\pm$ one standard deviation.

Theorems & Definitions (25)

  • Theorem 2.2
  • Theorem 2.4
  • Theorem 2.5
  • Lemma 4.1
  • Proposition 4.2
  • proof
  • Proposition 4.3
  • proof : Proof sketch of \ref{['th:omdhedge']}
  • proof : Proof sketch of \ref{['th:introadv']}
  • proof : Proof of \ref{['lem:bernmax']}
  • ...and 15 more