Table of Contents
Fetching ...

Probability calibration for precipitation nowcasting

Lauri Kurki, Yaniel Cabrera, Samu Karanko

TL;DR

Precipitation nowcasting with neural models increasingly requires probabilistic outputs, but conventional calibration metrics fail to capture miscalibration across precipitation thresholds. The authors introduce the expected thresholded calibration error (ETCE) to measure calibration over multiple precipitation thresholds and adapt calibration methods from computer vision, including selective scaling with lead-time conditioning. They find that selective scaling with an MLP or Segformer calibrator reduces ETCE by up to about 23%, while temperature scaling variants offer limited benefit. These results provide a practical path to more reliable probabilistic precipitation nowcasting by conditioning calibrators on lead time and mispredictions.

Abstract

Reliable precipitation nowcasting is critical for weather-sensitive decision-making, yet neural weather models (NWMs) can produce poorly calibrated probabilistic forecasts. Standard calibration metrics such as the expected calibration error (ECE) fail to capture miscalibration across precipitation thresholds. We introduce the expected thresholded calibration error (ETCE), a new metric that better captures miscalibration in ordered classes like precipitation amounts. We extend post-processing techniques from computer vision to the forecasting domain. Our results show that selective scaling with lead time conditioning reduces model miscalibration without reducing the forecast quality.

Probability calibration for precipitation nowcasting

TL;DR

Precipitation nowcasting with neural models increasingly requires probabilistic outputs, but conventional calibration metrics fail to capture miscalibration across precipitation thresholds. The authors introduce the expected thresholded calibration error (ETCE) to measure calibration over multiple precipitation thresholds and adapt calibration methods from computer vision, including selective scaling with lead-time conditioning. They find that selective scaling with an MLP or Segformer calibrator reduces ETCE by up to about 23%, while temperature scaling variants offer limited benefit. These results provide a practical path to more reliable probabilistic precipitation nowcasting by conditioning calibrators on lead time and mispredictions.

Abstract

Reliable precipitation nowcasting is critical for weather-sensitive decision-making, yet neural weather models (NWMs) can produce poorly calibrated probabilistic forecasts. Standard calibration metrics such as the expected calibration error (ECE) fail to capture miscalibration across precipitation thresholds. We introduce the expected thresholded calibration error (ETCE), a new metric that better captures miscalibration in ordered classes like precipitation amounts. We extend post-processing techniques from computer vision to the forecasting domain. Our results show that selective scaling with lead time conditioning reduces model miscalibration without reducing the forecast quality.

Paper Structure

This paper contains 12 sections, 6 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Miscalibration diagram at precipitation threshold $1.5 \operatorname{mm/h}$ and selected confidence bins $[0.05, 0.10],\ [0.45,0.50],\ [0.85,0.90]$. The mean confidence and mean observed frequency for each bin are depicted by dashed and solid curves respectively.
  • Figure 2: ETCE as a function of lead time for the uncalibrated model, and after applying temperature scaling, local temperature scaling and selective scaling.
  • Figure A.1: Number of predictions at threshold $r \geq 1.0$ mm/h within five selected confidence bins.
  • Figure A.2: ETCE as a function of lead time for selective scaling using different classifiers for flagging mispredictions. Uncalibrated baseline shown with a dashed line.
  • Figure A.3: The difference between model confidence and accuracy for thresholds $r\geq1.0$ mm/h and $r \geq 2.0$ mm/h before and after applying calibration (Selective scaling with MLP classifier). Reduction in colored area shows reduction in miscalibration. Smallest area in between dashed and solid lines is best.