Table of Contents
Fetching ...

An Axiomatic Approach to Loss Aggregation and an Adapted Aggregating Algorithm

Armando J. Cabrera Pacheco, Rabanus Derr, Robert C. Williamson

TL;DR

This work advances online learning under expert advice by showing that broad, reasonable loss aggregations are exactly the quasi-sums $\mathbf{Q}^u_n(x_1,\dots,x_n)=u^{-1}\big(\sum_{i=1}^n u(x_i)\big)$. It then develops an Aggregating Algorithm variant (APA-QS) that handles these quasi-sums through a weighting profile $f$ and a transformed loss $u$, preserving Bayes-like updating and a time-invariant regret bound via a change of variables $u(x) = -\ln f(x)$. The authors prove optimality results for quasi-sum aggregations, extend the framework to non-mixable losses, and provide an interpretation of aggregation as encoding the forecaster's attitude toward losses, supported by a weather-forecasting experiment that demonstrates how different generators shape predictions and tail behavior. Overall, the paper unifies axiomatic aggregation with online learning guarantees and offers practical guidance on selecting aggregations to control extreme losses.

Abstract

Supervised learning has gone beyond the expected risk minimization framework. Central to most of these developments is the introduction of more general aggregation functions for losses incurred by the learner. In this paper, we turn towards online learning under expert advice. Via easily justified assumptions we characterize a set of reasonable loss aggregation functions as quasi-sums. Based upon this insight, we suggest a variant of the Aggregating Algorithm tailored to these more general aggregation functions. This variant inherits most of the nice theoretical properties of the AA, such as recovery of Bayes' updating and a time-independent bound on quasi-sum regret. Finally, we argue that generalized aggregations express the attitude of the learner towards losses.

An Axiomatic Approach to Loss Aggregation and an Adapted Aggregating Algorithm

TL;DR

This work advances online learning under expert advice by showing that broad, reasonable loss aggregations are exactly the quasi-sums . It then develops an Aggregating Algorithm variant (APA-QS) that handles these quasi-sums through a weighting profile and a transformed loss , preserving Bayes-like updating and a time-invariant regret bound via a change of variables . The authors prove optimality results for quasi-sum aggregations, extend the framework to non-mixable losses, and provide an interpretation of aggregation as encoding the forecaster's attitude toward losses, supported by a weather-forecasting experiment that demonstrates how different generators shape predictions and tail behavior. Overall, the paper unifies axiomatic aggregation with online learning guarantees and offers practical guidance on selecting aggregations to control extreme losses.

Abstract

Supervised learning has gone beyond the expected risk minimization framework. Central to most of these developments is the introduction of more general aggregation functions for losses incurred by the learner. In this paper, we turn towards online learning under expert advice. Via easily justified assumptions we characterize a set of reasonable loss aggregation functions as quasi-sums. Based upon this insight, we suggest a variant of the Aggregating Algorithm tailored to these more general aggregation functions. This variant inherits most of the nice theoretical properties of the AA, such as recovery of Bayes' updating and a time-independent bound on quasi-sum regret. Finally, we argue that generalized aggregations express the attitude of the learner towards losses.
Paper Structure (18 sections, 11 theorems, 80 equations, 2 figures, 3 tables)

This paper contains 18 sections, 11 theorems, 80 equations, 2 figures, 3 tables.

Key Result

Lemma 4.2

Let $\mathbf{A} \colon \bigcup_{n \in \mathbb{N}} [0,\infty)^n \longrightarrow [0, \infty)$ be an aggregation function. Suppose that $\mathbf{A}$ is continuous, strictly increasing, associative and loss compatible, i.e., it satsifies thm:aggregation - continuity - thm:aggregation - loss compatibilit If furthermore $\mathbf{A}$ is positively homogeneous thm:aggregation - Positive Homogeneity, then

Figures (2)

  • Figure 1: Graphical Summary of the Steps in the Aggregating Algorithm. Experts $\theta_1$ and $\theta_2$ provide predictions $\xi(\theta_1)$ and $\xi(\theta_2)$, respectively, which are placed in the simplex (top-left) as $x_1 \coloneqq (\xi(\theta_1), 1- \xi(\theta_1))$ and $x_2 \coloneqq (\xi(\theta_2), 1- \xi(\theta_2))$ via $s \mapsto (s,1-s)$. The log-loss embeds the simplex as a curve in $\mathbb{R}^2$ (top-right), i.e. $s \mapsto -\ln s$ is applied coordinate-wise and maps $x_1$ and $x_2$ to $x_1'$ and $x_2'$. Then, the exponential mapping projects them into $[0,1]^2$. The aggregating algorithm forms a convex combination $\psi$ of the projected predictions $x_1"$ and $x_2"$ based on weights updated by a Bayesian-type formula (orange-brown), called a pseudo-prediction , which is substituted back to the simplex via a substitution function $\Sigma$ (darkgreen).
  • Figure 2: Comparative Example of Linear and Squared Utility. The horizontal axis denotes the loss value . The vertical axis the negative utility of the loss. We compare the negative utility function $u(x) = x$ to $u(x)=x^2$. In particular, for two values highlighted by a darkgreen arrow, low value, and an orange-brown arrow, high value.

Theorems & Definitions (39)

  • Remark 3.1
  • Example 3.2
  • Definition 4.1: Aggregation Functions
  • Lemma 4.2: Axiomatical Characterization of Loss-Aggregations
  • proof
  • Definition 4.3: Aggregation as Quasi-Sum
  • Example 4.4: $p$-Norms
  • Lemma 4.5
  • proof
  • Example 4.6
  • ...and 29 more