Table of Contents
Fetching ...

Calibrating Bayesian UNet++ for Sub-Seasonal Forecasting

Busra Asan, Abdullah Akgül, Alper Unal, Melih Kandemir, Gozde Unal

TL;DR

The paper addresses the challenge of producing reliable, calibrated sub-seasonal temperature forecasts. It extends UNet++ to a Bayesian CNN by treating the final layers as Bayesian with weights $\theta \sim \mathcal{N}$ and optimizes the $ELBO$ to yield predictive distributions, then couples this with a CDF-based calibration using an isotonic regressor $R: [0,1] \to [0,1]$ so that $R \circ F_t$ reflects empirical frequencies. Key findings show that well-calibrated Bayesian forecasts achieve higher sharpness and more accurate coverage than MC-Dropout or Deep Ensemble baselines, albeit with a trade-off in traditional point-error metrics like MAE. The approach leverages CMIP6 for training and ERA5 for fine-tuning, delivering calibrated and sharper probabilistic forecasts that can be generalized to other climate variables and safety-critical forecasting tasks.

Abstract

Seasonal forecasting is a crucial task when it comes to detecting the extreme heat and colds that occur due to climate change. Confidence in the predictions should be reliable since a small increase in the temperatures in a year has a big impact on the world. Calibration of the neural networks provides a way to ensure our confidence in the predictions. However, calibrating regression models is an under-researched topic, especially in forecasters. We calibrate a UNet++ based architecture, which was shown to outperform physics-based models in temperature anomalies. We show that with a slight trade-off between prediction error and calibration error, it is possible to get more reliable and sharper forecasts. We believe that calibration should be an important part of safety-critical machine learning applications such as weather forecasters.

Calibrating Bayesian UNet++ for Sub-Seasonal Forecasting

TL;DR

The paper addresses the challenge of producing reliable, calibrated sub-seasonal temperature forecasts. It extends UNet++ to a Bayesian CNN by treating the final layers as Bayesian with weights and optimizes the to yield predictive distributions, then couples this with a CDF-based calibration using an isotonic regressor so that reflects empirical frequencies. Key findings show that well-calibrated Bayesian forecasts achieve higher sharpness and more accurate coverage than MC-Dropout or Deep Ensemble baselines, albeit with a trade-off in traditional point-error metrics like MAE. The approach leverages CMIP6 for training and ERA5 for fine-tuning, delivering calibrated and sharper probabilistic forecasts that can be generalized to other climate variables and safety-critical forecasting tasks.

Abstract

Seasonal forecasting is a crucial task when it comes to detecting the extreme heat and colds that occur due to climate change. Confidence in the predictions should be reliable since a small increase in the temperatures in a year has a big impact on the world. Calibration of the neural networks provides a way to ensure our confidence in the predictions. However, calibrating regression models is an under-researched topic, especially in forecasters. We calibrate a UNet++ based architecture, which was shown to outperform physics-based models in temperature anomalies. We show that with a slight trade-off between prediction error and calibration error, it is possible to get more reliable and sharper forecasts. We believe that calibration should be an important part of safety-critical machine learning applications such as weather forecasters.
Paper Structure (7 sections, 4 equations, 2 figures, 1 table)

This paper contains 7 sections, 4 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: $50\%$ confidence interval (Top) and $90\%$ confidence interval (Bottom) of the Bayesian UNet++ for a sample in the North West Coast of America are given. The mean coverage percentages for confidence intervals are $63\%$ and $91\%$ for the calibrated, and $66\%$ and $98\%$ for the uncalibrated models. Thus, we choose a representative sample. Uncalibrated confidence interval plots are shown on the left (green), and calibrated plots are on the right (blue). Grey dots refer to the average temperature values for each month in the given time period (2016-2021). The percentage of the values falling within the intervals aligns more closely with the expected confidence levels, both at $50\%$ and $90\%$ in the calibrated model's plot.
  • Figure 2: Calibration plot suggested by kuleshov2018accurate given for a sample in the grid in Figure \ref{['fig:confidence_intervals']} to evaluate the calibration of the forecasts. Each predicted confidence level is plotted against its corresponding expected confidence level. Predictions illustrate the frequency of observing an outcome $Y_{t}$ at each level. We expect calibrated models to be closer to $y=x$.