Table of Contents
Fetching ...

Signature-Kernel Based Evaluation Metrics for Robust Probabilistic and Tail-Event Forecasting

Benjamin R. Redhead, Thomas L. Lee, Peng Gu, Víctor Elvira, Amos Storkey

TL;DR

This work tackles the problem of evaluating probabilistic multivariate time-series forecasts with a focus on tail events and temporal dependencies. It introduces Sig-MMD and CSig-MMD, two kernel-based metrics built on the signature kernel to compare forecasted joint distributions from samples, without assuming independence over time or variables. Sig-MMD uses the signature feature space to capture complex dependencies, while CSig-MMD adds a censoring scheme via Mahalanobis distance and a tail-focused weight to emphasize tail accuracy while preserving properness. The approach is validated on synthetic and real-world datasets, including zero-shot foundation models, showing improved sensitivity to multivariate structure and tail behavior, with practical implications for robust, tail-aware forecasting evaluation and model development.

Abstract

Probabilistic forecasting is increasingly critical across high-stakes domains, from finance and epidemiology to climate science. However, current evaluation frameworks lack a consensus metric and suffer from two critical flaws: they often assume independence across time steps or variables, and they demonstrably lack sensitivity to tail events, the very occurrences that are most pivotal in real-world decision-making. To address these limitations, we propose two kernel-based metrics: the signature maximum mean discrepancy (Sig-MMD) and our novel censored Sig-MMD (CSig-MMD). By leveraging the signature kernel, these metrics capture complex inter-variate and inter-temporal dependencies and remain robust to missing data. Furthermore, CSig-MMD introduces a censoring scheme that prioritizes a forecaster's capability to predict tail events while strictly maintaining properness, a vital property for a good scoring rule. These metrics enable a more reliable evaluation of direct multi-step forecasting, facilitating the development of more robust probabilistic algorithms.

Signature-Kernel Based Evaluation Metrics for Robust Probabilistic and Tail-Event Forecasting

TL;DR

This work tackles the problem of evaluating probabilistic multivariate time-series forecasts with a focus on tail events and temporal dependencies. It introduces Sig-MMD and CSig-MMD, two kernel-based metrics built on the signature kernel to compare forecasted joint distributions from samples, without assuming independence over time or variables. Sig-MMD uses the signature feature space to capture complex dependencies, while CSig-MMD adds a censoring scheme via Mahalanobis distance and a tail-focused weight to emphasize tail accuracy while preserving properness. The approach is validated on synthetic and real-world datasets, including zero-shot foundation models, showing improved sensitivity to multivariate structure and tail behavior, with practical implications for robust, tail-aware forecasting evaluation and model development.

Abstract

Probabilistic forecasting is increasingly critical across high-stakes domains, from finance and epidemiology to climate science. However, current evaluation frameworks lack a consensus metric and suffer from two critical flaws: they often assume independence across time steps or variables, and they demonstrably lack sensitivity to tail events, the very occurrences that are most pivotal in real-world decision-making. To address these limitations, we propose two kernel-based metrics: the signature maximum mean discrepancy (Sig-MMD) and our novel censored Sig-MMD (CSig-MMD). By leveraging the signature kernel, these metrics capture complex inter-variate and inter-temporal dependencies and remain robust to missing data. Furthermore, CSig-MMD introduces a censoring scheme that prioritizes a forecaster's capability to predict tail events while strictly maintaining properness, a vital property for a good scoring rule. These metrics enable a more reliable evaluation of direct multi-step forecasting, facilitating the development of more robust probabilistic algorithms.
Paper Structure (25 sections, 1 theorem, 18 equations, 7 figures, 25 tables)

This paper contains 25 sections, 1 theorem, 18 equations, 7 figures, 25 tables.

Key Result

Proposition 4.1

As $k_{sig}$ is a characteristic signature kernel, let $w(x)$ be a measurable weighting function defining a censored distribution as in Eq. eq:censoring. Then censored Sig-MMD remains strictly proper. For any method defining the censored region such that it induces a fixed measurable partition of th

Figures (7)

  • Figure 1: Comparison of forecast samples on Tail (Top, ERA5) and Body (Bottom, EWELD) scenarios. Top: Chronos-2, which receives the lowest score from Sig-MMD, QL, and VS, fails to predict the initial extreme spike, predicting phantom spikes later instead. In contrast, Moirai, which scores lowest on our censored metric (CSig-MMD), successfully captures the magnitude and timing of the initial tail event. Bottom: In the body scenario, Chronos-2 which scores lowest on Sig-MMD fits the series geometry closely with low noise. This contrasts with Moirai-MoE which scores lowest on CRPS and ES, and produces noisier samples that adhere less strictly to the ground truth scale. This highlights the efficacy of signature-based metrics in distinguishing shape and tail fidelity where standard metrics fail.
  • Figure 2: The censoring process redistributes the probability mass from the body of the distribution (grey) to the Signature of the zero-path, while preserving the probability mass inside the target region (blue) which represents the tails of the distribution.
  • Figure 3: This figure displays how the score from CSig-MMD varies with censoring quantile on the Exchange dataset. This figure shows as censoring quantile is reduced more points fall into the tail region and as the tail region converges to the whole set of outcomes CSig-MMD converges to Sig-MMD.
  • Figure 4: Power heatmaps for Wrong Mean (All Dimensions) experiment.
  • Figure 5: Power heatmaps for Wrong Exponential Scaling experiment.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Proposition 4.1