Table of Contents
Fetching ...

Align-DA: Align Score-based Atmospheric Data Assimilation with Multiple Preferences

Jing-An Sun, Hang Fan, Junchao Gong, Ben Fei, Kun Chen, Fenghua Ling, Wenlong Zhang, Wanghan Xu, Li Yan, Pierre Gentine, Lei Bai

TL;DR

Align-DA reframes data assimilation as a diffusion-based generative task in a latent space, learning a background-conditioned prior $p(\boldsymbol x|\boldsymbol x_b)$ and refining it via direct preference optimization using rewards for assimilation accuracy, forecast skill, and physical adherence. By coupling a score-based prior with observation guidance and multi-reward Diffusion-DPO, the method yields posterior analyses that are more informative and physically consistent, while reducing the need for manual tuning. Empirical results on ERA5-like data and GDAS observations show that multi-reward alignment consistently improves assimilation and forecast metrics across guidance schemes, though benefits depend on observation density and guidance strength. The approach offers a flexible, data-driven path to encode domain knowledge (e.g., geostrophic balance) as soft constraints, potentially extending to online RL and broader stages of the weather forecasting workflow.

Abstract

Data assimilation (DA) aims to estimate the full state of a dynamical system by combining partial and noisy observations with a prior model forecast, commonly referred to as the background. In atmospheric applications, this problem is fundamentally ill-posed due to the sparsity of observations relative to the high-dimensional state space. Traditional methods address this challenge by simplifying background priors to regularize the solution, which are empirical and require continual tuning for application. Inspired by alignment techniques in text-to-image diffusion models, we propose Align-DA, which formulates DA as a generative process and uses reward signals to guide background priors, replacing manual tuning with data-driven alignment. Specifically, we train a score-based model in the latent space to approximate the background-conditioned prior, and align it using three complementary reward signals for DA: (1) assimilation accuracy, (2) forecast skill initialized from the assimilated state, and (3) physical adherence of the analysis fields. Experiments with multiple reward signals demonstrate consistent improvements in analysis quality across different evaluation metrics and observation-guidance strategies. These results show that preference alignment, implemented as a soft constraint, can automatically adapt complex background priors tailored to DA, offering a promising new direction for advancing the field.

Align-DA: Align Score-based Atmospheric Data Assimilation with Multiple Preferences

TL;DR

Align-DA reframes data assimilation as a diffusion-based generative task in a latent space, learning a background-conditioned prior and refining it via direct preference optimization using rewards for assimilation accuracy, forecast skill, and physical adherence. By coupling a score-based prior with observation guidance and multi-reward Diffusion-DPO, the method yields posterior analyses that are more informative and physically consistent, while reducing the need for manual tuning. Empirical results on ERA5-like data and GDAS observations show that multi-reward alignment consistently improves assimilation and forecast metrics across guidance schemes, though benefits depend on observation density and guidance strength. The approach offers a flexible, data-driven path to encode domain knowledge (e.g., geostrophic balance) as soft constraints, potentially extending to online RL and broader stages of the weather forecasting workflow.

Abstract

Data assimilation (DA) aims to estimate the full state of a dynamical system by combining partial and noisy observations with a prior model forecast, commonly referred to as the background. In atmospheric applications, this problem is fundamentally ill-posed due to the sparsity of observations relative to the high-dimensional state space. Traditional methods address this challenge by simplifying background priors to regularize the solution, which are empirical and require continual tuning for application. Inspired by alignment techniques in text-to-image diffusion models, we propose Align-DA, which formulates DA as a generative process and uses reward signals to guide background priors, replacing manual tuning with data-driven alignment. Specifically, we train a score-based model in the latent space to approximate the background-conditioned prior, and align it using three complementary reward signals for DA: (1) assimilation accuracy, (2) forecast skill initialized from the assimilated state, and (3) physical adherence of the analysis fields. Experiments with multiple reward signals demonstrate consistent improvements in analysis quality across different evaluation metrics and observation-guidance strategies. These results show that preference alignment, implemented as a soft constraint, can automatically adapt complex background priors tailored to DA, offering a promising new direction for advancing the field.

Paper Structure

This paper contains 17 sections, 15 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Schematic of Align-DA. Conventional score-based DA learns a background-conditioned diffusion model to approximate a broad prior distribution $p(\pmb x_{\text{truth}} \mid \pmb x_b)$, which, after incorporating observations, yields a posterior estimate $p(\pmb x_{\text{truth}} \mid \pmb x_b, \pmb y)$ that may be overly dispersed (left). Align-DA leverages reward-guided alignment to adaptively refine and concentrate the prior $p_{\text{align}}(\pmb x_{\text{truth}} \mid \pmb x_b)$, resulting in a posterior $p_{\text{align}}(\pmb x_{\text{truth}} \mid \pmb x_b, \pmb y)$ that better reflects observational constraints and more closely aligns with the requirements of DA (right).
  • Figure 2: Visualization of Align-DA Assimilation Performance for Z500 Analysis. (10 averaged analysis with randomly selected timestamp in 2019) 'ObsFree' means without observation integration baseline. 'Repaint' and 'DPS' represent repaint and diffusion posterior sampling guidance methods. Note that '-P' indicates single physical reward alignment, while '-M' signifies multi-reward alignment. First row: Reference fields showing ERA5 reanalysis (ground truth denoted as GT), background field, and observational data. Second row: Error reduction through single physical reward alignment, quantified as $|\pmb x_a^{\text{ref}}-\pmb x_{\text{GT}}|-|\pmb x_a^{\text{Align-P}}-\pmb x_{\text{GT}}|$. Warm hues indicate regions where physical adherence alignment moderately improves accuracy, suggesting potential reward correlations. Third row: Error reduction through multi-reward alignment, quantified as $|\pmb x_a^{\text{ref}}-\pmb x_{\text{GT}}|-|\pmb x_a^{\text{Align-M}}-\pmb x_{\text{GT}}|$. More intense warm hues indicate regions where multi-reward alignment substantially improves accuracy compared to the unaligned baseline. The attenuation of color intensity from left to right columns demonstrates diminishing DPO effects as more effective observation guidance methods are applied - a pattern consistent with theoretical expectations.
  • Figure 3: Geo-Score comparisons across alignment strategies: (a) Reference model, (b) Multi-reward aligned (-M), and (c) Single physical adherence reward aligned (-P) results, with colors denoting different guidance methods.
  • Figure 4: Interaction between observation constraints and alignment. Better incorporation of observation information (i.e., stronger constraints) leads to a narrower space for improvement by the alignment process.
  • Figure 5: Visulaization of u700 at a 2019-01-26-18:00 UTC.
  • ...and 3 more figures