Scoring rule nets: beyond mean target prediction in multivariate regression
Daan Roordink, Sibylle Hess
TL;DR
This paper tackles multivariate probabilistic regression by introducing Conditional CRPS (CCRPS), a multivariate extension of CRPS that is more sensitive to correlation than the Energy Score. It develops closed-form CCRPS expressions for common distributions and constructs CCRPS-based ANN losses for multivariate Gaussian mixtures, along with an Energy Score ensemble loss for differentiable training. Through synthetic and real-world experiments, CCRPS-based methods often outperform maximum likelihood estimation and achieve performance on par with nonparametric methods like Distributional Random Forest. The work demonstrates that CCRPS can improve sharpness while maintaining calibration, providing a practical and principled route for reliable multivariate probabilistic forecasting.
Abstract
Probabilistic regression models trained with maximum likelihood estimation (MLE), can sometimes overestimate variance to an unacceptable degree. This is mostly problematic in the multivariate domain. While univariate models often optimize the popular Continuous Ranked Probability Score (CRPS), in the multivariate domain, no such alternative to MLE has yet been widely accepted. The Energy Score - the most investigated alternative - notoriously lacks closed-form expressions and sensitivity to the correlation between target variables. In this paper, we propose Conditional CRPS: a multivariate strictly proper scoring rule that extends CRPS. We show that closed-form expressions exist for popular distributions and illustrate their sensitivity to correlation. We then show in a variety of experiments on both synthetic and real data, that Conditional CRPS often outperforms MLE, and produces results comparable to state-of-the-art non-parametric models, such as Distributional Random Forest (DRF).
