Align-DA: Align Score-based Atmospheric Data Assimilation with Multiple Preferences
Jing-An Sun, Hang Fan, Junchao Gong, Ben Fei, Kun Chen, Fenghua Ling, Wenlong Zhang, Wanghan Xu, Li Yan, Pierre Gentine, Lei Bai
TL;DR
Align-DA reframes data assimilation as a diffusion-based generative task in a latent space, learning a background-conditioned prior $p(\boldsymbol x|\boldsymbol x_b)$ and refining it via direct preference optimization using rewards for assimilation accuracy, forecast skill, and physical adherence. By coupling a score-based prior with observation guidance and multi-reward Diffusion-DPO, the method yields posterior analyses that are more informative and physically consistent, while reducing the need for manual tuning. Empirical results on ERA5-like data and GDAS observations show that multi-reward alignment consistently improves assimilation and forecast metrics across guidance schemes, though benefits depend on observation density and guidance strength. The approach offers a flexible, data-driven path to encode domain knowledge (e.g., geostrophic balance) as soft constraints, potentially extending to online RL and broader stages of the weather forecasting workflow.
Abstract
Data assimilation (DA) aims to estimate the full state of a dynamical system by combining partial and noisy observations with a prior model forecast, commonly referred to as the background. In atmospheric applications, this problem is fundamentally ill-posed due to the sparsity of observations relative to the high-dimensional state space. Traditional methods address this challenge by simplifying background priors to regularize the solution, which are empirical and require continual tuning for application. Inspired by alignment techniques in text-to-image diffusion models, we propose Align-DA, which formulates DA as a generative process and uses reward signals to guide background priors, replacing manual tuning with data-driven alignment. Specifically, we train a score-based model in the latent space to approximate the background-conditioned prior, and align it using three complementary reward signals for DA: (1) assimilation accuracy, (2) forecast skill initialized from the assimilated state, and (3) physical adherence of the analysis fields. Experiments with multiple reward signals demonstrate consistent improvements in analysis quality across different evaluation metrics and observation-guidance strategies. These results show that preference alignment, implemented as a soft constraint, can automatically adapt complex background priors tailored to DA, offering a promising new direction for advancing the field.
