Table of Contents
Fetching ...

Diffusion models for probabilistic precipitation generation from atmospheric variables

Michael Aich, Sebastian Bathiany, Philipp Hess, Yu Huang, Niklas Boers

TL;DR

The paper addresses biases and resolution limits in traditional precipitation parameterizations by introducing a two-stage, data-driven framework that learns high-resolution precipitation from large-scale atmospheric variables. It combines a deterministic UNet regression at 1-degree resolution with a conditional diffusion model that generates 0.25-degree ensembles, trained exclusively on ERA5 to enable application to arbitrary ESMs. The approach substantially reduces spatial biases, improves statistics and extremes, and preserves large-scale climate trends under future scenarios while enabling fast, probabilistic projections. This method provides a computationally efficient, model-agnostic downscaling and bias-correction tool with significant potential for integration into climate modeling workflows.

Abstract

Improving the representation of precipitation in Earth system models (ESMs) is critical for assessing the impacts of climate change and especially of extreme events like floods and droughts. In existing ESMs, precipitation is not resolved explicitly, but represented by parameterizations. These typically rely on resolving approximated but computationally expensive column-based physics, not accounting for interactions between locations. They struggle to capture fine-scale precipitation processes and introduce significant biases. We present a novel approach, based on generative machine learning, which integrates a conditional diffusion model with a UNet architecture to generate accurate, high-resolution (0.25°) global daily precipitation fields from a small set of prognostic atmospheric variables. Unlike traditional parameterizations, our framework efficiently produces ensemble predictions, capturing uncertainties in precipitation, and does not require fine-tuning by hand. We train our model on the ERA5 reanalysis and present a method that allows us to apply it to arbitrary ESM data, enabling fast generation of probabilistic forecasts and climate scenarios. By leveraging interactions between global prognostic variables, our approach provides an alternative parameterization scheme that mitigates biases present in the ESM precipitation while maintaining consistency with its large-scale (annual) trends. This work demonstrates that complex precipitation patterns can be learned directly from large-scale atmospheric variables, offering a computationally efficient alternative to conventional schemes.

Diffusion models for probabilistic precipitation generation from atmospheric variables

TL;DR

The paper addresses biases and resolution limits in traditional precipitation parameterizations by introducing a two-stage, data-driven framework that learns high-resolution precipitation from large-scale atmospheric variables. It combines a deterministic UNet regression at 1-degree resolution with a conditional diffusion model that generates 0.25-degree ensembles, trained exclusively on ERA5 to enable application to arbitrary ESMs. The approach substantially reduces spatial biases, improves statistics and extremes, and preserves large-scale climate trends under future scenarios while enabling fast, probabilistic projections. This method provides a computationally efficient, model-agnostic downscaling and bias-correction tool with significant potential for integration into climate modeling workflows.

Abstract

Improving the representation of precipitation in Earth system models (ESMs) is critical for assessing the impacts of climate change and especially of extreme events like floods and droughts. In existing ESMs, precipitation is not resolved explicitly, but represented by parameterizations. These typically rely on resolving approximated but computationally expensive column-based physics, not accounting for interactions between locations. They struggle to capture fine-scale precipitation processes and introduce significant biases. We present a novel approach, based on generative machine learning, which integrates a conditional diffusion model with a UNet architecture to generate accurate, high-resolution (0.25°) global daily precipitation fields from a small set of prognostic atmospheric variables. Unlike traditional parameterizations, our framework efficiently produces ensemble predictions, capturing uncertainties in precipitation, and does not require fine-tuning by hand. We train our model on the ERA5 reanalysis and present a method that allows us to apply it to arbitrary ESM data, enabling fast generation of probabilistic forecasts and climate scenarios. By leveraging interactions between global prognostic variables, our approach provides an alternative parameterization scheme that mitigates biases present in the ESM precipitation while maintaining consistency with its large-scale (annual) trends. This work demonstrates that complex precipitation patterns can be learned directly from large-scale atmospheric variables, offering a computationally efficient alternative to conventional schemes.

Paper Structure

This paper contains 13 sections, 6 figures.

Figures (6)

  • Figure 1: Schematic overview of our two‐stage approach for generating high‐resolution global precipitation from atmospheric variables. 1) Deterministic UNet Regression Model. We first train a UNet model to learn the mapping from four atmospheric variables at 1° resolution (corresponding to a typical ESM resolution), specific humidity at 850 hPa, near‐surface eastward and northward wind components at 10m, and sea‐level pressure, to precipitation. During training, both the inputs and the 1° precipitation targets come from ERA5 reanalysis, ensuring the regression model captures the relationship between large‐scale atmospheric variables and precipitation. 2) Generative Diffusion Model. We then train a conditional diffusion model to better model the fine‐scale spatial structure that are lacking in the regression output. The input to this model are upsampled (1°) precipitation fields from ERA5, which we corrupt with noise in order to destroy its small‐scale variability. The diffusion model learns to restore these small‐scale patterns using high‐resolution (0.25°) ERA5 precipitation as the training target. At inference we take atmospheric variables from the ESM and apply the trained UNet regression model to generate a deterministic precipitation estimate. Before we use this estimate as a condition for our DM, we apply quantile delta mapping to reduce spatial biases inherited from the biased atmospheric variables of GFDL. The diffusion model is then conditioned on noisy quantile mapped precipitation estimates. By sampling multiple times, we can generate large ensembles of global 0.25° precipitation predictions that are faithful to the large‐scale dynamics, yet capture realistic small‐scale variability.
  • Figure 2: Model biases of GFDL, UNET and DM. The maps show differences in time mean precipitation between ERA5 and (A) GFDL, (B) the UNet regression model, and (C) the DM. The difference map of GFDL shows pronounced bias in the tropical regions, including a double ITCZ. Our regression model alone already improves over GFDL, but the DM leads to further reduced deviations from ERA5 and the smallest mean absolute bias. Note that we downsampled ERA5 and the DM to 1° by applying average pooling for fair comparison to GFDL.
  • Figure 3: Reproduction of the statistics of ERA5 at 0.25° resolution over 40 years.(A) Mean spatial power spectral density (PSD). The diffusion model corrects the small-scale spatial details and follows the target distribution closely. (B) Histogram indicating the precipitation frequencies. The histogram also shows large improvements with slight deviations from the ERA5 reference data at 0.25° resolution for extreme precipitation. (C) Longitude profile, given by the data averaged over all longitudes, weighted by the cosine of latitude to account for the varying grid cell area. (D) Latitude profile, given by the data averaged over all longitudes. Our diffusion model approximates the latitude and longitude profile of the original ERA5 reference data well. For panels (B-C), we bi-linearly upsample the GFDL and our regression model predictions of precipitation (orange/cyan) from 1° to the 0.25° resolution of ERA5 and the DM.
  • Figure 4: Evaluation of extreme event coverage for the historical period.(A-D): Climatology of R95p (annual total precipitation from days exceeding the 95th percentile) for the historical period simulated by (A) UNet at 1°, (B) GFDL at 1°, and (C) the Diffusion Model (DM) downsampled to 1°, compared to (D) ERA5 reanalysis downsampled to 1°. The GFDL model exhibits strong wet biases in the tropics (A), whereas the DM and UNet models show reduced biases (E, G), aligning well with the spatial patterns of ERA5.
  • Figure 5: Evaluation of the 50 DM ensemble members over one reference EAR5 year. Temporally averaged continuous ranked probability score (CRPS) (lower is better) for (A) applying our deterministic UNet and then bi-linearly interpolating to 0.25°, (B) our DM and (C) the bi-linearly upsampled deterministic ERA5 baseline. Our DM achieves the lowest CRPS overall. (D) shows that our DM also maintains the lowest spatially averaged CRPS throughout the year. (E) The spread-skill plot indicates that our DM closely follows the 1:1 line, demonstrating well-calibrated spread of the DM model with respect to the underlying uncertainties.
  • ...and 1 more figures