Deep Generative Data Assimilation in Multimodal Setting

Yongquan Qu; Juan Nathaniel; Shuolin Li; Pierre Gentine

Deep Generative Data Assimilation in Multimodal Setting

Yongquan Qu, Juan Nathaniel, Shuolin Li, Pierre Gentine

TL;DR

This work addresses the challenge of robust data assimilation in multimodal, nonlinear Earth system contexts by reframing assimilation as a conditional generation task in a latent, diffusion-based model. SLAMS learns a unified latent space via an encoder/decoder pair and conditions generation with a score-based diffusion process, enabling sampling from $p(x(0) \mid y)$ while quantifying uncertainty. The key contributions include (i) a latent multimodal fusion scheme that obviates explicit observation operators, (ii) a scalable, stable diffusion-based conditioning mechanism with Bayes-consistent scores, and (iii) extensive ablations showing resilience to low-resolution, noisy, and sparse observations, with ex-situ satellite data providing notable improvements for ToA variables. Overall, SLAMS offers a principled, data-driven probabilistic framework for multimodal data assimilation in real-world Earth system modeling, with demonstrated potential to enhance next-generation simulators and uncertainty-aware forecasts.

Abstract

Robust integration of physical knowledge and data is key to improve computational simulations, such as Earth system models. Data assimilation is crucial for achieving this goal because it provides a systematic framework to calibrate model outputs with observations, which can include remote sensing imagery and ground station measurements, with uncertainty quantification. Conventional methods, including Kalman filters and variational approaches, inherently rely on simplifying linear and Gaussian assumptions, and can be computationally expensive. Nevertheless, with the rapid adoption of data-driven methods in many areas of computational sciences, we see the potential of emulating traditional data assimilation with deep learning, especially generative models. In particular, the diffusion-based probabilistic framework has large overlaps with data assimilation principles: both allows for conditional generation of samples with a Bayesian inverse framework. These models have shown remarkable success in text-conditioned image generation or image-controlled video synthesis. Likewise, one can frame data assimilation as observation-conditioned state calibration. In this work, we propose SLAMS: Score-based Latent Assimilation in Multimodal Setting. Specifically, we assimilate in-situ weather station data and ex-situ satellite imagery to calibrate the vertical temperature profiles, globally. Through extensive ablation, we demonstrate that SLAMS is robust even in low-resolution, noisy, and sparse data settings. To our knowledge, our work is the first to apply deep generative framework for multimodal data assimilation using real-world datasets; an important step for building robust computational simulators, including the next-generation Earth system models. Our code is available at: https://github.com/yongquan-qu/SLAMS

Deep Generative Data Assimilation in Multimodal Setting

TL;DR

while quantifying uncertainty. The key contributions include (i) a latent multimodal fusion scheme that obviates explicit observation operators, (ii) a scalable, stable diffusion-based conditioning mechanism with Bayes-consistent scores, and (iii) extensive ablations showing resilience to low-resolution, noisy, and sparse observations, with ex-situ satellite data providing notable improvements for ToA variables. Overall, SLAMS offers a principled, data-driven probabilistic framework for multimodal data assimilation in real-world Earth system modeling, with demonstrated potential to enhance next-generation simulators and uncertainty-aware forecasts.

Abstract

Paper Structure (15 sections, 11 equations, 8 figures)

This paper contains 15 sections, 11 equations, 8 figures.

Introduction
Methodology
Multimodality in Unified Latent Space
Score-based Data Assimilation in Latent Space
Scalability and Numerical Stability
Experimental Setup
Datasets
Details on Autoencoder
Details on Score Network
Results and Discussion
Ideal Case: Assimilating High Quality Data
Realistic Case: Assimilating Low Quality Data
Toward Consistent Multimodal Assimilation
Multimodal Feature Ablation
Conclusion

Figures (8)

Figure 1: Schematic diagram illustrating the high-level concept of DA. It provides a systematic framework that calibrates background states (model outputs) with multimodal observations (often low-resolution, noisy, and sparse) to produce analysis.
Figure 2: We propose SLAMS - Score-based Latent Assimilation in Multimodal Setting. During training, we fit three set of models, including encoder, decoder, and the score network given multimodal data sources (e.g., background states, in-situ sensors, and ex-situ satellite measurements). During inference, we perform latent space denoising through a reverse SDE process by sampling from prior Gaussian distribution and conditioning on encoded background and observations, synthetically coarsified, noisified, and sparsified by a differentiable measurement function $\mathcal{A}$ for ablation purposes.
Figure 3: Ideal case where we have high resolution, low noise, dense inputs (4x coarsening, $\sigma^2=0.1$) to calibrate $t@200hpa$. We find that pixel-based DA generates qualitatively better assimilated states.
Figure 4: Realistic case where we have low-resolution inputs (20x coarsening) to calibrate $t@200hpa$. Our latent-based DA approach, SLAMS, is physically more consistent.
Figure 5: Realistic case where we have noisy inputs ($\sigma^2 = 4$) to calibrate $t@1000hpa$. Our latent-based DA approach, SLAMS, is physically more consistent.
...and 3 more figures

Deep Generative Data Assimilation in Multimodal Setting

TL;DR

Abstract

Deep Generative Data Assimilation in Multimodal Setting

Authors

TL;DR

Abstract

Table of Contents

Figures (8)