Table of Contents
Fetching ...

DAMBench: A Multi-Modal Benchmark for Deep Learning-based Atmospheric Data Assimilation

Hao Wang, Zixuan Weng, Jindong Han, Wei Fan, Hao Liu

TL;DR

DAMBench delivers the first large-scale, multi-modal benchmark for deep learning–assisted atmospheric data assimilation using real-world ERA5-based backgrounds and observations from weather stations and satellites. It provides standardized tasks, a transparent evaluation protocol, and a lightweight multi-modal representation adapter to test observation fusion, enabling fair comparisons across DL-DA methods. Empirical results show that current neural-process and generative approaches improve significantly over background forecasts, with multi-modal observations yielding additional gains and validating the benchmark’s realism and utility. The work promotes reproducibility and extensibility toward operational, real-world DA systems and lays groundwork for future Earth-system AI research and uncertainty-aware inference.

Abstract

Data Assimilation is a cornerstone of atmospheric system modeling, tasked with reconstructing system states by integrating sparse, noisy observations with prior estimation. While traditional approaches like variational and ensemble Kalman filtering have proven effective, recent advances in deep learning offer more scalable, efficient, and flexible alternatives better suited for complex, real-world data assimilation involving large-scale and multi-modal observations. However, existing deep learning-based DA research suffers from two critical limitations: (1) reliance on oversimplified scenarios with synthetically perturbed observations, and (2) the absence of standardized benchmarks for fair model comparison. To address these gaps, in this work, we introduce DAMBench, the first large-scale multi-modal benchmark designed to evaluate data-driven DA models under realistic atmospheric conditions. DAMBench integrates high-quality background states from state-of-the-art forecasting systems and real-world multi-modal observations (i.e., real-world weather stations and satellite imagery). All data are resampled to a common grid and temporally aligned to support systematic training, validation, and testing. We provide unified evaluation protocols and benchmark representative data assimilation approaches, including latent generative models and neural process frameworks. Additionally, we propose a lightweight multi-modal plugin to demonstrate how integrating realistic observations can enhance even simple baselines. Through comprehensive experiments, DAMBench establishes a rigorous foundation for future research, promoting reproducibility, fair comparison, and extensibility to real-world multi-modal scenarios. Our dataset and code are publicly available at https://github.com/figerhaowang/DAMBench.

DAMBench: A Multi-Modal Benchmark for Deep Learning-based Atmospheric Data Assimilation

TL;DR

DAMBench delivers the first large-scale, multi-modal benchmark for deep learning–assisted atmospheric data assimilation using real-world ERA5-based backgrounds and observations from weather stations and satellites. It provides standardized tasks, a transparent evaluation protocol, and a lightweight multi-modal representation adapter to test observation fusion, enabling fair comparisons across DL-DA methods. Empirical results show that current neural-process and generative approaches improve significantly over background forecasts, with multi-modal observations yielding additional gains and validating the benchmark’s realism and utility. The work promotes reproducibility and extensibility toward operational, real-world DA systems and lays groundwork for future Earth-system AI research and uncertainty-aware inference.

Abstract

Data Assimilation is a cornerstone of atmospheric system modeling, tasked with reconstructing system states by integrating sparse, noisy observations with prior estimation. While traditional approaches like variational and ensemble Kalman filtering have proven effective, recent advances in deep learning offer more scalable, efficient, and flexible alternatives better suited for complex, real-world data assimilation involving large-scale and multi-modal observations. However, existing deep learning-based DA research suffers from two critical limitations: (1) reliance on oversimplified scenarios with synthetically perturbed observations, and (2) the absence of standardized benchmarks for fair model comparison. To address these gaps, in this work, we introduce DAMBench, the first large-scale multi-modal benchmark designed to evaluate data-driven DA models under realistic atmospheric conditions. DAMBench integrates high-quality background states from state-of-the-art forecasting systems and real-world multi-modal observations (i.e., real-world weather stations and satellite imagery). All data are resampled to a common grid and temporally aligned to support systematic training, validation, and testing. We provide unified evaluation protocols and benchmark representative data assimilation approaches, including latent generative models and neural process frameworks. Additionally, we propose a lightweight multi-modal plugin to demonstrate how integrating realistic observations can enhance even simple baselines. Through comprehensive experiments, DAMBench establishes a rigorous foundation for future research, promoting reproducibility, fair comparison, and extensibility to real-world multi-modal scenarios. Our dataset and code are publicly available at https://github.com/figerhaowang/DAMBench.

Paper Structure

This paper contains 45 sections, 33 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: (a) DA recovers the system state (analysis states) by combining prior estimation (background states) with sparse and noisy observations (b) Existing methods often oversimplify the observation process by generating inputs via synthetic perturbations of background states, failing to capture the statistical and spatial structure of real-world measurements. (c) In contrast, our benchmark adopts realistic observational masks derived from reanalysis data and evaluates performance under operationally relevant conditions.
  • Figure 2: Overview of DAMBench's real-world observations with multi-modal information. We collected multi-modal real-world observations from both satellites and observatories, enabling more realistic exploration towards deep learning-based data assimilation.
  • Figure 3: Overall framework of multi-modal representation adapter, a lightweight plugin for the integration of multi-modal observations.