Table of Contents
Fetching ...

Physics-Constrained Adaptive Flow Matching for Climate Downscaling

Kevin Debeire, Aytaç Paçal, Pierre Gentine, Luis Medrano-Navarro, Nils Thuerey, Veronika Eyring

Abstract

Regional climate information at kilometer scales is essential for assessing the impacts of climate change, but generating it with global climate models is too expensive due to their high computational costs. Machine learning models offer a fast alternative, yet they often violate basic physical laws and degrade when applied to climates outside of their training distribution. We present Physics-Constrained Adaptive Flow Matching (PC-AFM), a generative downscaling model that addresses both problems. Building on the Adaptive Flow Matching (AFM) model of Fotiadis et al. (2025) as our baseline, we add soft conservation constraints that keep the downscaled output consistent with the large-scale input for precipitation and humidity, and use gradient surgery via the ConFIG algorithm to prevent these constraints from interfering with the generative objective. We train the model on Central Europe climate data, evaluate it on a 10-time downscaling task (63km to 6.3km) over six variables (near-surface temperature, precipitation, specific humidity, surface pressure, and horizontal wind components) across a comprehensive set of metrics including bias, ensemble skill scores, power spectra, and conservation error, and test the generalization on two held-out climate regions. Within the training distribution, PC-AFM reduces conservation errors and improves ensemble calibration while matching the baseline on standard skill metrics. Outside the training distribution, where unconstrained models develop large systematic errors by extrapolating learned statistics, PC-AFM halves precipitation wet bias, reduces conservation error and improves extreme-quantile accuracy, all without any information about the target climate at inference time. These results indicate that physical consistency is a practical requirement for deploying generative downscaling models in real-world applications.

Physics-Constrained Adaptive Flow Matching for Climate Downscaling

Abstract

Regional climate information at kilometer scales is essential for assessing the impacts of climate change, but generating it with global climate models is too expensive due to their high computational costs. Machine learning models offer a fast alternative, yet they often violate basic physical laws and degrade when applied to climates outside of their training distribution. We present Physics-Constrained Adaptive Flow Matching (PC-AFM), a generative downscaling model that addresses both problems. Building on the Adaptive Flow Matching (AFM) model of Fotiadis et al. (2025) as our baseline, we add soft conservation constraints that keep the downscaled output consistent with the large-scale input for precipitation and humidity, and use gradient surgery via the ConFIG algorithm to prevent these constraints from interfering with the generative objective. We train the model on Central Europe climate data, evaluate it on a 10-time downscaling task (63km to 6.3km) over six variables (near-surface temperature, precipitation, specific humidity, surface pressure, and horizontal wind components) across a comprehensive set of metrics including bias, ensemble skill scores, power spectra, and conservation error, and test the generalization on two held-out climate regions. Within the training distribution, PC-AFM reduces conservation errors and improves ensemble calibration while matching the baseline on standard skill metrics. Outside the training distribution, where unconstrained models develop large systematic errors by extrapolating learned statistics, PC-AFM halves precipitation wet bias, reduces conservation error and improves extreme-quantile accuracy, all without any information about the target climate at inference time. These results indicate that physical consistency is a practical requirement for deploying generative downscaling models in real-world applications.

Paper Structure

This paper contains 19 sections, 14 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Geographic domains used for training and evaluation. Evaluation diagnostics for all three domains are computed using ESMValTool. The Central Europe domain (blue) is used for training. The Iberian Peninsula (green) and Northern Europe region (orange) are withheld from training and used exclusively for out-of-distribution evaluation.
  • Figure 2: Overview of the PC-AFM architecture and training procedure. Top: At inference, the low-resolution input (32$\times$32, 63 km) is bilinearly upsampled and passed through the learned encoder $E_\psi$ to produce an initial high-resolution estimate $\hat{x}_0$. A stochastic interpolant $x_t = (1-t)\hat{x}_0 + t x_1 + \sigma_t \varepsilon$ is constructed and refined by the denoiser $D_\theta$, conditioned on the low-resolution input and noise level $\sigma_t$. Fifty denoising steps yield 12 ensemble members at 320$\times$320 resolution (6.3 km). Both networks use no positional embeddings to support geographic generalization. Bottom: During training, gradients from the generative objective ($\mathcal{L}_\text{AFM} + \lambda_\text{enc}\,\mathcal{L}_\text{enc}$) and the physics-constrained conservation objective ($\mathcal{L}_\text{phys}$) are combined via the ConFIG operator, which ensures non-negative progress on both objectives simultaneously. The conservation penalty is down-weighted at high noise levels and disabled during a 200k-sample warmup period.
  • Figure 3: Relative performance of PC-AFM versus AFM-baseline for the Central Europe training region. Each cell shows the ratio of PC-AFM to AFM-baseline; values below 1 (green) indicate improvement. Conservation error is not applicable ("--") for variables without an explicit conservation constraint. Bold entries summarize row and column averages.
  • Figure 4: Precipitation (pr) evaluation for the Central Europe training region. (A) Spatial maps of time-mean bias, relative bias, CRPS, and conservation error for AFM-baseline (top) and PC-AFM (bottom). (B) Radially averaged power spectral density. (C) Log-transformed marginal PDF. (D) Rank histograms with MCB. PC-AFM halves the conservation error and improves ensemble calibration (MCB: 0.767 to 0.523) while maintaining comparable bias and CRPS.
  • Figure 5: Quantile MAE relative performance (PC-AFM / AFM-baseline) for impact-relevant diagnostics in the Central Europe training region. Average ratio: 0.71.
  • ...and 6 more figures