Machine learning emulation of precipitation from km-scale regional climate simulations using a diffusion model

Henry Addison; Elizabeth Kendon; Suman Ravuri; Laurence Aitchison; Peter AG Watson

Machine learning emulation of precipitation from km-scale regional climate simulations using a diffusion model

Henry Addison, Elizabeth Kendon, Suman Ravuri, Laurence Aitchison, Peter AG Watson

TL;DR

CPMGEM introduces a diffusion-model emulator to reproduce daily-mean precipitation at $8.8$ km from $60$ km GCM inputs, using the UK CPM as training data. The approach yields realistic spatial structure and extreme-event representations while offering orders-of-magnitude faster samples than running a km-scale CPM. The emulator demonstrates transferability to GCM inputs, captures the 21st-century climate change signal in many aspects (notably summer), and remains effective even with limited training data. This method enables large-ensemble, high-resolution rainfall projections across multiple GCMs and scenarios, with potential applications in flood risk, adaptation planning, and uncertainty quantification.

Abstract

High-resolution climate simulations are valuable for understanding climate change impacts. This has motivated use of regional convection-permitting climate models (CPMs), but these are very computationally expensive. We present a convection-permitting model generative emulator (CPMGEM), to skilfully emulate precipitation simulations by a 2.2km-resolution regional CPM at much lower cost. This utilises a generative machine learning approach, a diffusion model. It takes inputs at the 60km resolution of the driving global climate model and downscales these to 8.8km, with daily-mean time resolution, capturing the effect of convective processes represented in the CPM at these scales. The emulator is trained on simulations over England and Wales from the United Kingdom Climate Projections Local product, covering years between 1980 and 2080 following a high emissions scenario. The output precipitation has a similarly realistic spatial structure and intensity distribution to the CPM simulations. The emulator is stochastic, which improves the realism of samples. We show evidence that the emulator has skill for extreme events with ~100 year return times. It captures the main features of the simulated 21st century climate change, but exhibits some error in the magnitude. We demonstrate successful transfer from a "perfect model" training setting to application using GCM variable inputs. We also show that the method can be useful in situations with limited amounts of high-resolution data. Potential applications include producing high-resolution precipitation predictions for large-ensemble climate simulations and producing output based on different GCMs and climate change scenarios to better sample uncertainty.

Machine learning emulation of precipitation from km-scale regional climate simulations using a diffusion model

TL;DR

CPMGEM introduces a diffusion-model emulator to reproduce daily-mean precipitation at

km from

km GCM inputs, using the UK CPM as training data. The approach yields realistic spatial structure and extreme-event representations while offering orders-of-magnitude faster samples than running a km-scale CPM. The emulator demonstrates transferability to GCM inputs, captures the 21st-century climate change signal in many aspects (notably summer), and remains effective even with limited training data. This method enables large-ensemble, high-resolution rainfall projections across multiple GCMs and scenarios, with potential applications in flood risk, adaptation planning, and uncertainty quantification.

Abstract

Paper Structure (26 sections, 4 equations, 11 figures, 5 tables)

This paper contains 26 sections, 4 equations, 11 figures, 5 tables.

Introduction
Materials and methods
Data
Target CPM precipitation
Coarse predictors
ML Models
CPMGEM: Diffusion model-based emulator
Comparison methods
Training
Training, validation and test datasets
Variable transformations
Adjustment of GCM inputs
Evaluation diagnostics
Radially Averaged Power Spectral Density
Spread-Error plot
...and 11 more sections

Figures (11)

Figure 1: Schematic diagram of the inputs and outputs of the emulator. The emulator is trained to stochastically generate samples of high-resolution, daily mean precipitation over England and Wales (bottom panels). The target is for these samples to have properties matching output from the Met Office UK convection-permitting model. The emulator is stochastic, and can generate any number of samples for a single set of inputs. For input fields, the emulator takes variables at the same 60km grid spacing as the global climate model runs used to drive the CPM. These input fields are pressure at mean sea-level and specific humidity, temperature and vorticity at 250, 500, 700 and 850hPa (all daily means).
Figure 2: Examples of predictions of daily-mean precipitation. The first row shows results for a wet day in winter (December--February, DJF; the 80th percentile of the domain-mean). The second row shows the wettest winter day in the 108 year test dataset. The third and fourth rows are similar but for summer (June--August, JJA). The first column is the precipitation from the convection-permitting model (CPM). The second column is the coarsened CPM precipitation bilinearly interpolated to 8.8km resolution of our emulator. Column 3 is an example coarse resolution input field, the 850hPa vorticity. Contours, in grey, are drawn in steps of $2\times10^{-5}\textrm{s}^{-1}$ between $-10^{-4}\textrm{s}^{-1}$ and $+10^{-4}\textrm{s}^{-1}$ with dashed lines for negative values and solid lines for positive. Columns 4 and 5 are samples chosen at random from the emulator using coarsened CPM atmospheric variables as predictors (CPMGEM_cCPM). Column 6 is the prediction by U-Net. Note that the highly stochastic nature of precipitation downscaling means samples from the diffusion model are not expected to match the CPM precipitation in full detail, but to represent the distribution of plausible precipitation fields for the given low resolution predictors, where the CPM simulation output is a single example.
Figure 3: Statistical properties of predictions. (a) Histograms of precipitation values on the 8.8km grid. The grey shaded area is the frequency density of the target CPM precipitation. The lines show frequency densities from the diffusion model emulator acting on coarsened CPM and GCM inputs respectively ("CPMGEM_cCPM" (blue) and "CPMGEM_GCM" (green)), U-Net_cCPM (orange) and CPM precipitation coarsened to 60km resolution with bilinear interpolation (cCPM Bilinear; dark grey). Note the vertical axis is logarithmic. (b) Relative mean bias as a percentage of the CPM mean for each model. (c) Same as (b) but for standard deviation bias.
Figure 4: Spread of predictions. (a) Scatter plot of daily domain-mean precipitation for samples from CPMGEM_cCPM versus target CPM values. (b) Same as (a) for U-Net_cCPM. (c) Spread-error plot of CPMGEM_cCPM, indicating the calibration of the stochastic component (see Section \ref{['sec:methods:spread-error']} for details).
Figure 5: Radially averaged spatial power spectral density (RAPSD). Shows the target CPM precipitation (grey dashes), the emulator samples, CPMGEM_cCPM (blue line) and CPMGEM_GCM (green line), U-Net_cCPM (orange) and cCPM Bilinear (dark grey).
...and 6 more figures

Machine learning emulation of precipitation from km-scale regional climate simulations using a diffusion model

TL;DR

Abstract

Machine learning emulation of precipitation from km-scale regional climate simulations using a diffusion model

Authors

TL;DR

Abstract

Table of Contents

Figures (11)