DAISI: Data Assimilation with Inverse Sampling using Stochastic Interpolants
Martin Andrae, Erik Larsson, So Takao, Tomas Landelius, Fredrik Lindsten
TL;DR
DAISI addresses the data assimilation challenge in high-dimensional, nonlinear settings by leveraging a stationary flow-based prior learned from dynamical systems and integrating it with a forecast ensemble through inverse sampling. The method maps forecast states into latent space via backward SDE inversion and then performs guided forward sampling to produce approximate samples from the filtering distribution, enabling zero-shot conditioning on observations with flexible guidance. Across Lorenz '63, SQG, and SEVIR, DAISI yields accurate, temporally coherent filtering under sparse, noisy, and multimodal observations, often matching or surpassing traditional filters while maintaining ensemble diversity. The framework is modular, scalable, and interpretable through hyperparameters $t_{\min}$ and $\epsilon$, with clear directions for improving exactness and computational efficiency.
Abstract
Data assimilation (DA) is a cornerstone of scientific and engineering applications, combining model forecasts with sparse and noisy observations to estimate latent system states. Classical DA methods, such as the ensemble Kalman filter, rely on Gaussian approximations and heuristic tuning (e.g., inflation and localization) to scale to high dimensions. While often successful, these approximations can make the methods unstable or inaccurate when the underlying distributions of states and observations depart significantly from Gaussianity. To address this limitation, we introduce DAISI, a scalable filtering algorithm built on flow-based generative models that enables flexible probabilistic inference using data-driven priors. The core idea is to use a stationary, pre-trained generative prior to assimilate observations via guidance-based conditional sampling while incorporating forecast information through a novel inverse-sampling step. This step maps the forecast ensemble into a latent space to provide initial conditions for the conditional sampling, allowing us to encode model dynamics into the DA pipeline without having to retrain or fine-tune the generative prior at each assimilation step. Experiments on challenging nonlinear systems show that DAISI achieves accurate filtering results in regimes with sparse, noisy, and nonlinear observations where traditional methods struggle.
