Table of Contents
Fetching ...

Mixability of Integral Losses: a Key to Efficient Online Aggregation of Functional and Probabilistic Forecasts

Alexander Korotin, Vladimir V'yugin, Evgeny Burnaev

TL;DR

This work extends online prediction with expert advice to function-valued forecasts by introducing X-integral losses and proving that η-mixable (exp-concave) losses induce η-mixable (exp-concave) integral losses. The authors establish a general transfer theorem and apply it to a wide class of probabilistic-forecasting losses, including CRPS, SCRPS, energy-based distance, KL, Beta-2, CFD, MMD, OT, and SW2, deriving explicit aggregation rules and learning-rate guarantees. The main theoretical contribution is the transfer of mixability/exp-concavity to the integral setting, enabling the Aggregating Algorithm to achieve constant regret across function-valued predictions. This provides a principled, scalable framework for online probabilistic forecasting and distributional learning, with practical aggregation via simple mixtures or Wasserstein barycenters.

Abstract

In this paper we extend the setting of the online prediction with expert advice to function-valued forecasts. At each step of the online game several experts predict a function, and the learner has to efficiently aggregate these functional forecasts into a single forecast. We adapt basic mixable (and exponentially concave) loss functions to compare functional predictions and prove that these adaptations are also mixable (exp-concave). We call this phenomenon mixability (exp-concavity) of integral loss functions. As an application of our main result, we prove that various loss functions used for probabilistic forecasting are mixable (exp-concave). The considered losses include Sliced Continuous Ranked Probability Score, Energy-Based Distance, Optimal Transport Costs and Sliced Wasserstein-2 distance, Beta-2 and Kullback-Leibler divergences, Characteristic function and Maximum Mean Discrepancies.

Mixability of Integral Losses: a Key to Efficient Online Aggregation of Functional and Probabilistic Forecasts

TL;DR

This work extends online prediction with expert advice to function-valued forecasts by introducing X-integral losses and proving that η-mixable (exp-concave) losses induce η-mixable (exp-concave) integral losses. The authors establish a general transfer theorem and apply it to a wide class of probabilistic-forecasting losses, including CRPS, SCRPS, energy-based distance, KL, Beta-2, CFD, MMD, OT, and SW2, deriving explicit aggregation rules and learning-rate guarantees. The main theoretical contribution is the transfer of mixability/exp-concavity to the integral setting, enabling the Aggregating Algorithm to achieve constant regret across function-valued predictions. This provides a principled, scalable framework for online probabilistic forecasting and distributional learning, with practical aggregation via simple mixtures or Wasserstein barycenters.

Abstract

In this paper we extend the setting of the online prediction with expert advice to function-valued forecasts. At each step of the online game several experts predict a function, and the learner has to efficiently aggregate these functional forecasts into a single forecast. We adapt basic mixable (and exponentially concave) loss functions to compare functional predictions and prove that these adaptations are also mixable (exp-concave). We call this phenomenon mixability (exp-concavity) of integral loss functions. As an application of our main result, we prove that various loss functions used for probabilistic forecasting are mixable (exp-concave). The considered losses include Sliced Continuous Ranked Probability Score, Energy-Based Distance, Optimal Transport Costs and Sliced Wasserstein-2 distance, Beta-2 and Kullback-Leibler divergences, Characteristic function and Maximum Mean Discrepancies.

Paper Structure

This paper contains 20 sections, 8 theorems, 79 equations, 2 figures, 1 table, 2 algorithms.

Key Result

Theorem 3.1

Let $(\Gamma,\sigma_{\Gamma})$, $(\Omega,\sigma_{\Omega})$, $(\mathcal{X},\sigma_{\mathcal{X}})$ be measurable spaces. Assume that ${\lambda:\Gamma\times\Omega\rightarrow\mathbb{R}_{+}}$ is a loss function measurable w.r.t. product $\sigma_{\Gamma}\times\sigma_{\Omega}$. Let $\lambda_{u,\mu}$ be $\m defined by point-wise ($x\in \mathcal{X}$) application of substitution function $\Sigma_{\lambda}$

Figures (2)

  • Figure 1: Visualization of the comparison of CDFs of distributions $\gamma,\omega$ on $[a,b]$ by using Continuous Ranked Probability Score.
  • Figure 2: Visualization of the comparison of CDFs of distributions $\gamma,\omega$ on $[a,b]$ by using Optimal Transport Cost.

Theorems & Definitions (18)

  • Definition 1: Integral loss function
  • Definition 2: Mixable loss function
  • Example 1: Square loss
  • Example 2: Logarithmic loss
  • Definition 3: Exponentially concave loss function
  • Theorem 3.1: Mixability & Exp-concavity of Integral Losses
  • Theorem 3.2: Generalized Holder Inequality
  • proof
  • Theorem 4.1: Equivalence of SCRPS and Energy-Based Distance
  • proof
  • ...and 8 more