Table of Contents
Fetching ...

Training-Free Data Assimilation with GenCast

Thomas Savary, François Rozet, Gilles Louppe

TL;DR

This work addresses Bayesian state estimation for dynamical systems without additional training by marrying pre-trained diffusion models with particle filters. It leverages GenCast as the diffusion prior and develops a training-free data assimilation workflow that samples from the optimal proposal using posterior-score decomposition, estimates mean dynamics with a denoiser, and computes weights via a Dirac-approximation, implemented through a Fully-Adapted Auxiliary Particle Filter with inflation. Empirical results on a GenCast-based global weather setting with 256 particles show that FA-APF yields stable skill for both observed and unobserved variables, outperforming unconditional GenCast forecasts while maintaining nonzero ensemble spread. The approach is lightweight and broadly applicable to autoregressive diffusion models, enabling operational data assimilation without retraining and offering a natural path to diffusion-based reanalysis in the future.

Abstract

Data assimilation is widely used in many disciplines such as meteorology, oceanography, and robotics to estimate the state of a dynamical system from noisy observations. In this work, we propose a lightweight and general method to perform data assimilation using diffusion models pre-trained for emulating dynamical systems. Our method builds on particle filters, a class of data assimilation algorithms, and does not require any further training. As a guiding example throughout this work, we illustrate our methodology on GenCast, a diffusion-based model that generates global ensemble weather forecasts.

Training-Free Data Assimilation with GenCast

TL;DR

This work addresses Bayesian state estimation for dynamical systems without additional training by marrying pre-trained diffusion models with particle filters. It leverages GenCast as the diffusion prior and develops a training-free data assimilation workflow that samples from the optimal proposal using posterior-score decomposition, estimates mean dynamics with a denoiser, and computes weights via a Dirac-approximation, implemented through a Fully-Adapted Auxiliary Particle Filter with inflation. Empirical results on a GenCast-based global weather setting with 256 particles show that FA-APF yields stable skill for both observed and unobserved variables, outperforming unconditional GenCast forecasts while maintaining nonzero ensemble spread. The approach is lightweight and broadly applicable to autoregressive diffusion models, enabling operational data assimilation without retraining and offering a natural path to diffusion-based reanalysis in the future.

Abstract

Data assimilation is widely used in many disciplines such as meteorology, oceanography, and robotics to estimate the state of a dynamical system from noisy observations. In this work, we propose a lightweight and general method to perform data assimilation using diffusion models pre-trained for emulating dynamical systems. Our method builds on particle filters, a class of data assimilation algorithms, and does not require any further training. As a guiding example throughout this work, we illustrate our methodology on GenCast, a diffusion-based model that generates global ensemble weather forecasts.

Paper Structure

This paper contains 15 sections, 1 theorem, 14 equations, 7 figures, 1 algorithm.

Key Result

Theorem A.1

Assuming that $p(x^{k}_{t}\mid x^{k}) = \mathcal{N}(x^{k}_{t} \mid \alpha_{t} x^{k}, \sigma_{t}^{2}I)$ and that $x^{k+1}_{t}$ is conditionally independent of $x^{k}$ given $x^{k+1}$, the first moment of the distribution $p(x^{k+1}_{t} \mid x^{k})$ is linked to the score function $\nabla_{x^{k+1}_{t}

Figures (7)

  • Figure 1: Conditional posterior predictive distribution (blue curve) and corresponding observation (red dashed line) at an arbitrarily chosen point of the grid with coordinates (lat=50, lon=5) for the surface temperature variable. The observation is consistent with the posterior predictive distribution.
  • Figure 2: Skill comparison between the FA-APF (blue curve) and the ensemble of unconditional GenCast trajectories (red curve) for the surface temperature (left), the surface U component of wind (middle) and the geopotential at 500 hPA (right). The FA-APF allows to obtain a low and more or less constant skill after 7 days of observations, even for unobserved variables.
  • Figure 3: Skill for temperature, geopotential, V component of wind and specific humidity at three different pressure levels (100, 250 and 850 hPa). The skill reaches a plateau after a certain number of time steps for all variables (even those that are not observed), well below the one of GenCast's forecasts.
  • Figure 4: Spread for temperature, geopotential, V component of wind and specific humidity at three different pressure levels (100, 250 and 850 hPa). The spread is non-zero and of the same order of magnitude as the skill, indicating that we capture a distribution rather than collapsing onto a single mode.
  • Figure 5: Comparison of surface temperature between the reference ERA5 trajectory (first row), the FA-APF ensemble mean (second row), and the GenCast ensemble mean (third row) after 3, 7, and 15 days.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem A.1
  • proof