Valid Error Bars for Neural Weather Models using Conformal Prediction

Vignesh Gopakumar; Joel Oskarrson; Ander Gray; Lorenzo Zanisi; Stanislas Pamela; Daniel Giles; Matt Kusner; Marc Deisenroth

Valid Error Bars for Neural Weather Models using Conformal Prediction

Vignesh Gopakumar, Joel Oskarrson, Ander Gray, Lorenzo Zanisi, Stanislas Pamela, Daniel Giles, Matt Kusner, Marc Deisenroth

TL;DR

The paper tackles the lack of uncertainty quantification in neural weather forecasts by introducing an inductive conformal prediction (CP) post-processing framework that yields calibrated prediction sets with coverage $1-\alpha$ for every spatio-temporal point, without modifying the underlying model. It extends CP to the spatio-temporal domain, calibrating per-cell predictions using non-conformity scores (RES for deterministic outputs and STD for probabilistic Gaussian outputs) and estimating a quantile $\hat{q}$ to form the prediction intervals. The method is demonstrated on Hi-LAM, a limited-area Nordic neural weather model, using both a deterministic MSE and a probabilistic NLL variant, achieving empirical coverage around $91\%$ at the nominal $90\%$ level and showing tighter bounds for STD than RES. Key contributions include per-cell spatio-temporal calibration, a formal non-conformity score framework, and a practical, low-cost approach to obtaining reliable uncertainty bounds that complement ensemble methods.

Abstract

Neural weather models have shown immense potential as inexpensive and accurate alternatives to physics-based models. However, most models trained to perform weather forecasting do not quantify the uncertainty associated with their forecasts. This limits the trust in the model and the usefulness of the forecasts. In this work we construct and formalise a conformal prediction framework as a post-processing method for estimating this uncertainty. The method is model-agnostic and gives calibrated error bounds for all variables, lead times and spatial locations. No modifications are required to the model and the computational cost is negligible compared to model training. We demonstrate the usefulness of the conformal prediction framework on a limited area neural weather model for the Nordic region. We further explore the advantages of the framework for deterministic and probabilistic models.

Valid Error Bars for Neural Weather Models using Conformal Prediction

TL;DR

for every spatio-temporal point, without modifying the underlying model. It extends CP to the spatio-temporal domain, calibrating per-cell predictions using non-conformity scores (RES for deterministic outputs and STD for probabilistic Gaussian outputs) and estimating a quantile

to form the prediction intervals. The method is demonstrated on Hi-LAM, a limited-area Nordic neural weather model, using both a deterministic MSE and a probabilistic NLL variant, achieving empirical coverage around

at the nominal

level and showing tighter bounds for STD than RES. Key contributions include per-cell spatio-temporal calibration, a formal non-conformity score framework, and a practical, low-cost approach to obtaining reliable uncertainty bounds that complement ensemble methods.

Abstract

Paper Structure (10 sections, 3 equations, 22 figures)

This paper contains 10 sections, 3 equations, 22 figures.

Introduction
Related Work
Conformal Prediction
Conformal Prediction over a Spatio-Temporal Domain
Formal Definition
Non-conformity Scores
Neural Weather Models
Results
Discussion
Additional Results

Figures (22)

Figure 1: Inductive CP Framework over a Deterministic Model (see RES in \ref{['nonconformity scores']}): (1) Perform calibration using a non-conformity metric (L1 error residual with $\hat{s}$ representing the calibration scores, $y_c, \tilde{y}_c$ the calibration targets and predictions respectively). (2) Estimate the quantile corresponding to the desired coverage from the CDF of the non-conformity scores ($n$ represents the calibration sample size, $(1-\alpha)$ the desired coverage, $F_{\hat{s}}^{-1}$ the quantile function applied over the inverse CDF of non-conformity scores, $\hat{q}$ the quantile matching the desired coverage). (3) Apply the quantile to the model predictions to estimate the prediction sets ($\tilde{y}_p$, the model predictions and $\hat{q}$ the upper and lower bars for the predictions).
Figure 2: Prediction (top), Ground Truth (middle) and width of the error bars (bottom) for predicting the temperature 2m above ground (2t) using Hi-LAM (MSE).
Figure 9: Empirical Coverage at different levels of $1-\alpha$ for Hi-LAM (MSE) and Hi-LAM (NLL).
Figure 10: Slice plots across the $x$-axis of a temporal prediction of a single variable (u-component of wind). Figures (a) - (d) depicts the ground truth, prediction, upper and lower bars obtained through the CP framework for the Hi-LAM (MSE) and Hi-LAM (NLL) for 95 percent coverage ($\alpha=0.05$) and 15 percent coverage ($\alpha=0.85$).
Figure : Hi-LAM (MSE)
...and 17 more figures

Valid Error Bars for Neural Weather Models using Conformal Prediction

TL;DR

Abstract

Valid Error Bars for Neural Weather Models using Conformal Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (22)