Signature-Kernel Based Evaluation Metrics for Robust Probabilistic and Tail-Event Forecasting
Benjamin R. Redhead, Thomas L. Lee, Peng Gu, Víctor Elvira, Amos Storkey
TL;DR
This work tackles the problem of evaluating probabilistic multivariate time-series forecasts with a focus on tail events and temporal dependencies. It introduces Sig-MMD and CSig-MMD, two kernel-based metrics built on the signature kernel to compare forecasted joint distributions from samples, without assuming independence over time or variables. Sig-MMD uses the signature feature space to capture complex dependencies, while CSig-MMD adds a censoring scheme via Mahalanobis distance and a tail-focused weight to emphasize tail accuracy while preserving properness. The approach is validated on synthetic and real-world datasets, including zero-shot foundation models, showing improved sensitivity to multivariate structure and tail behavior, with practical implications for robust, tail-aware forecasting evaluation and model development.
Abstract
Probabilistic forecasting is increasingly critical across high-stakes domains, from finance and epidemiology to climate science. However, current evaluation frameworks lack a consensus metric and suffer from two critical flaws: they often assume independence across time steps or variables, and they demonstrably lack sensitivity to tail events, the very occurrences that are most pivotal in real-world decision-making. To address these limitations, we propose two kernel-based metrics: the signature maximum mean discrepancy (Sig-MMD) and our novel censored Sig-MMD (CSig-MMD). By leveraging the signature kernel, these metrics capture complex inter-variate and inter-temporal dependencies and remain robust to missing data. Furthermore, CSig-MMD introduces a censoring scheme that prioritizes a forecaster's capability to predict tail events while strictly maintaining properness, a vital property for a good scoring rule. These metrics enable a more reliable evaluation of direct multi-step forecasting, facilitating the development of more robust probabilistic algorithms.
