Benchmarking AI-based data assimilation to advance data-driven global weather forecasting

Wuxin Wang; Weicheng Ni; Ben Fei; Tao Han; Lilan Huang; Taikang Yuan; Xiaoyong Li; Lei Bai; Boheng Duan; Kaijun Ren

Benchmarking AI-based data assimilation to advance data-driven global weather forecasting

Wuxin Wang, Weicheng Ni, Ben Fei, Tao Han, Lilan Huang, Taikang Yuan, Xiaoyong Li, Lei Bai, Boheng Duan, Kaijun Ren

TL;DR

DABench tackles the lack of real-world, objective benchmarks for AI-based data assimilation in global weather forecasting by unifying ERA5 reanalysis and GDAS prepbufr observations into a standardized, open benchmarking platform. It evaluates both deterministic and ensemble DA configurations and uses Pangu-Weather to assess the impact of AI-generated analyses on medium-range forecasts, with dual validation via ERA5 and independent radiosondes. Across a one-year DA cycle and a 10-day forecast horizon, AI-based DA methods—especially 4DVarFormer—show robustness and competitive performance relative to state-of-the-art AI-driven 4DVar frameworks, highlighting the potential for autonomous, data-driven global forecasting. The study also identifies limitations, such as the lack of satellite radiances and resolution constraints, and outlines future directions toward physics-informed AI, self-supervised learning, and hybrid AI-physics approaches to approach operational capabilities.

Abstract

Research on Artificial Intelligence (AI)-based Data Assimilation (DA) is expanding rapidly. However, the absence of an objective, comprehensive, and real-world benchmark hinders the fair comparison of diverse methods. Here, we introduce DABench, a benchmark designed for contributing to the development and evaluation of AI-based DA methods. By integrating real-world observations, DABench provides an objective and fair platform for validating long-term closed-loop DA cycles, supporting both deterministic and ensemble configurations. Furthermore, we assess the efficacy of AI-based DA in generating initial conditions for the advanced AI-based weather forecasting model to produce accurate medium-range global weather forecasting. Our dual-validation, utilizing both reanalysis data and independent radiosonde observations, demonstrates that AI-based DA achieves performance competitive with state-of-the-art AI-driven four-dimensional variational frameworks across both global weather DA and medium-range forecasting metrics. We invite the research community to utilize DABench to accelerate the advancement of AI-based DA for global weather forecasting.

Benchmarking AI-based data assimilation to advance data-driven global weather forecasting

TL;DR

Abstract

Paper Structure (21 sections, 19 equations, 11 figures, 3 tables)

This paper contains 21 sections, 19 equations, 11 figures, 3 tables.

Introduction
Results
The motivation and framework of DABench
Results of the one-year DA cycle
Evaluation using ERA5 as the reference
Evaluation using independent radiosondes as the reference
Results of the 10-day medium-range weather forecasting
Power Spectra of the analysis fields
Visualization of the analysis fields
Discussion
Methods
General problem definition
Datasets
Global weather reference
Background field
...and 6 more sections

Figures (11)

Figure 1: Overview of DABench. (a) Schematic illustration highlighting the significance of the benchmark for AI-based DA research. (b) Framework of DABench. The data-driven medium-range global weather forecasting system consists of two core components: the forecasting component, which utilizes the Pangu-Weather bi2023accurate model for generating forecasts, and the DA component, which integrates the background field and observations using the DA baselines evaluated in this study to produce the analysis required for initializing the forecasting task. The system is developed and evaluated using both ERA5 reanalysis and independent radiosonde observations, as well as deterministic and ensemble DA cycle configurations. The DA models are trained using real-world observations to approximate the ERA5 reanalysis, and are then evaluated against ERA5 for overall performance and against independent radiosonde observations for their ability to estimate the real atmosphere. Finally, medium-range weather forecasting is assessed using their outputs as initial fields to evaluate their potential for operational applications.
Figure 2: WRMSE of baselines over a one-year DA cycle using ERA5 as reference. The 5-day Pangu forecast is depicted by a black dashed line. The AI-based DA results are color-coded as follows: SwinTransformer (dark blue), 4DVarNet (light blue), 4DSRDA (dark yellow-green), Adas (light yellow-green), L4DVar (brown), SDA (purple), and 4DVarFormer (red). Evaluations were performed at 00:00 and 12:00 UTC daily throughout the year. Each subplot corresponds to a distinct variable, as indicated by the title.
Figure 3: CRPS of baselines over a one-year DA cycle using ERA5 as reference. The AI-based DA results are color-coded as follows: SwinTransformer (dark blue), 4DVarNet (light blue), 4DSRDA (dark yellow-green), Adas (light yellow-green), L4DVar (brown), SDA (purple), and 4DVarFormer (red). Evaluations were performed at 00:00 and 12:00 UTC daily throughout the year. Each subplot corresponds to a distinct variable, as indicated by the title.
Figure 4: WBias of baselines over a one-year DA cycle using ERA5 as reference. The AI-based DA results are color-coded as follows: SwinTransformer (dark blue), 4DVarNet (light blue), 4DSRDA (dark yellow-green), Adas (light yellow-green), L4DVar (brown), SDA (purple), and 4DVarFormer (red). Evaluations were performed at 00:00 and 12:00 UTC daily throughout the year. Each subplot corresponds to a distinct variable, as indicated by the title.
Figure 5: ORMSE of baselines over a one-year DA cycle using independent sounding as reference. The 5-day Pangu forecast is depicted by a black dashed line. The AI-based DA results are color-coded as follows: SwinTransformer (dark blue), 4DVarNet (light blue), 4DSRDA (dark yellow-green), Adas (light yellow-green), L4DVar (brown), SDA (purple), and 4DVarFormer (red). Evaluations were performed at 00:00 and 12:00 UTC daily throughout the year. Each subplot corresponds to a distinct variable, as indicated by the title.
...and 6 more figures

Benchmarking AI-based data assimilation to advance data-driven global weather forecasting

TL;DR

Abstract

Benchmarking AI-based data assimilation to advance data-driven global weather forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (11)