Table of Contents
Fetching ...

Proper Scoring Rules for Multivariate Probabilistic Forecasts based on Aggregation and Transformation

Romain Pic, Clément Dombry, Philippe Naveau, Maxime Taillardat

TL;DR

The paper addresses the challenge of interpreting multivariate forecast verification by introducing a framework built on transformation and aggregation to construct interpretable proper scoring rules. By combining transformations (e.g., projections, variograms, wavelets) with aggregation across margins, patches, or scales, the approach yields scores that target specific forecast features such as dependence structure and spatial anisotropy while preserving propriety. Through theoretical exposition and simulation, the authors show that these scores can discriminate misspecifications more insightfully than conventional multivariate scores and can bridge the gap between probabilistic scoring and spatial verification tools. The framework thus offers a practical path to more informative forecast verification, with relevance to weather, climate, and ML-based forecasting systems.

Abstract

Proper scoring rules are an essential tool to assess the predictive performance of probabilistic forecasts. However, propriety alone does not ensure an informative characterization of predictive performance and it is recommended to compare forecasts using multiple scoring rules. With that in mind, interpretable scoring rules providing complementary information are necessary. We formalize a framework based on aggregation and transformation to build interpretable multivariate proper scoring rules. Aggregation-and-transformation-based scoring rules are able to target specific features of the probabilistic forecasts; which improves the characterization of the predictive performance. This framework is illustrated through examples taken from the literature and studied using numerical experiments showcasing its benefits. In particular, it is shown that it can help bridge the gap between proper scoring rules and spatial verification tools.

Proper Scoring Rules for Multivariate Probabilistic Forecasts based on Aggregation and Transformation

TL;DR

The paper addresses the challenge of interpreting multivariate forecast verification by introducing a framework built on transformation and aggregation to construct interpretable proper scoring rules. By combining transformations (e.g., projections, variograms, wavelets) with aggregation across margins, patches, or scales, the approach yields scores that target specific forecast features such as dependence structure and spatial anisotropy while preserving propriety. Through theoretical exposition and simulation, the authors show that these scores can discriminate misspecifications more insightfully than conventional multivariate scores and can bridge the gap between probabilistic scoring and spatial verification tools. The framework thus offers a practical path to more informative forecast verification, with relevance to weather, climate, and ML-based forecasting systems.

Abstract

Proper scoring rules are an essential tool to assess the predictive performance of probabilistic forecasts. However, propriety alone does not ensure an informative characterization of predictive performance and it is recommended to compare forecasts using multiple scoring rules. With that in mind, interpretable scoring rules providing complementary information are necessary. We formalize a framework based on aggregation and transformation to build interpretable multivariate proper scoring rules. Aggregation-and-transformation-based scoring rules are able to target specific features of the probabilistic forecasts; which improves the characterization of the predictive performance. This framework is illustrated through examples taken from the literature and studied using numerical experiments showcasing its benefits. In particular, it is shown that it can help bridge the gap between proper scoring rules and spatial verification tools.
Paper Structure (45 sections, 5 theorems, 128 equations, 4 figures)

This paper contains 45 sections, 5 theorems, 128 equations, 4 figures.

Key Result

Proposition 1

Let $\mathcal{F}\subset\mathcal{P}(\mathbb{R}^d)$ be a class of Borel probability measure on $\mathbb{R}^d$ and let $F\in\mathcal{F}$ be a forecast and $\bm{y}\in\mathbb{R}^d$ an observation. Let $T:\mathbb{R}^d\to\mathbb{R}^k$ be a transformation and let $\mathrm{S}$ be a scoring rule on $\mathbb{R is proper relative to $\mathcal{F}$. If $\mathrm{S}$ is strictly proper relative to $T(\mathcal{F})

Figures (4)

  • Figure 1: Expectation of aggregated univariate scoring rules: (a) the CRPS, (b) the quantile score, (c) the Brier score, and (d) the squared error and the Dawid-Sebastiani score, for the ideal forecast (light violet), a biased forecast (orange), an under-dispersed forecast (lighter blue), an over-dispersed forecast (darker blue) and a local-scale Student forecast (green). More details are available in the main text.
  • Figure 2: Expectation of scoring rules focused the dependence structure: (a) the variogram score, (b) the $p$-variation score and (c) the patched energy score (and its limiting cases: the aggregated CRPS and the energy score), for the ideal forecast (violet), the small-range forecast (lighter blue), the large-range forecast (darker blue), the under-smooth forecast (lighter orange), and the over-smooth forecast (darker orange). More details are available in the main text.
  • Figure 3: Expectation of interpretable proper scoring rules focused the dependence structure: (a) the variogram score and (b) the anisotropic score, for the ideal forecast (violet), the small-angle forecast (lighter blue), the large-angle forecast (darker blue), the isotropic forecast (lighter orange) and the over-anisotropic forecast (darker orange). More details are available in the main text.
  • Figure 4: Expectation of scoring rules tested on their sensitivity to double-penalty effect : (a) the aggregated CRPS and the aggregated CRPS of spatial mean, and (b) the aggregated Brier score and the aggregated squared error of fraction of threshold exceedances, for the ideal forecast (violet), the additive-noised forecasts (shades of blue), and the multiplicative-noised forecasts (shades of orange). For the noised forecasts, darker colors correspond to larger values of the range $r\in\{0.1,\ 0.25,\ 0.5\}$. More details are available in the main text.

Theorems & Definitions (16)

  • Proposition 1
  • Proposition 2
  • Corollary 1
  • Proposition 3
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • ...and 6 more