Table of Contents
Fetching ...

A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

Junqi Yin, Siming Liang, Siyan Liu, Feng Bao, Hristo G. Chipilski, Dan Lu, Guannan Zhang

TL;DR

This work tackles the challenge of operationalizing AI-based weather and climate predictions by embedding real-time data assimilation into the forecasting workflow. It introduces EnSF, a training-free, diffusion-score based ensemble filter, and a ViT surrogate that can represent either physics-based or AI foundation models, enabling online adaptation from observations. Demonstrations on the Frontier supercomputer show that EnSF significantly outperforms the state-of-the-art LETKF in nonlinear, high-dimensional SQG turbulence and scales strongly to 1024 GPUs with compute-efficient ViT training. The framework promises practical impact by enabling real-time assimilation for AI-driven forecasts and lays groundwork for integration with operational centers like NOAA and ECMWF.

Abstract

The weather and climate domains are undergoing a significant transformation thanks to advances in AI-based foundation models such as FourCastNet, GraphCast, ClimaX and Pangu-Weather. While these models show considerable potential, they are not ready yet for operational use in weather forecasting or climate prediction. This is due to the lack of a data assimilation method as part of their workflow to enable the assimilation of incoming Earth system observations in real time. This limitation affects their effectiveness in predicting complex atmospheric phenomena such as tropical cyclones and atmospheric rivers. To overcome these obstacles, we introduce a generic real-time data assimilation framework and demonstrate its end-to-end performance on the Frontier supercomputer. This framework comprises two primary modules: an ensemble score filter (EnSF), which significantly outperforms the state-of-the-art data assimilation method, namely, the Local Ensemble Transform Kalman Filter (LETKF); and a vision transformer-based surrogate capable of real-time adaptation through the integration of observational data. The ViT surrogate can represent either physics-based models or AI-based foundation models. We demonstrate both the strong and weak scaling of our framework up to 1024 GPUs on the Exascale supercomputer, Frontier. Our results not only illustrate the framework's exceptional scalability on high-performance computing systems, but also demonstrate the importance of supercomputers in real-time data assimilation for weather and climate predictions. Even though the proposed framework is tested only on a benchmark surface quasi-geostrophic (SQG) turbulence system, it has the potential to be combined with existing AI-based foundation models, making it suitable for future operational implementations.

A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

TL;DR

This work tackles the challenge of operationalizing AI-based weather and climate predictions by embedding real-time data assimilation into the forecasting workflow. It introduces EnSF, a training-free, diffusion-score based ensemble filter, and a ViT surrogate that can represent either physics-based or AI foundation models, enabling online adaptation from observations. Demonstrations on the Frontier supercomputer show that EnSF significantly outperforms the state-of-the-art LETKF in nonlinear, high-dimensional SQG turbulence and scales strongly to 1024 GPUs with compute-efficient ViT training. The framework promises practical impact by enabling real-time assimilation for AI-driven forecasts and lays groundwork for integration with operational centers like NOAA and ECMWF.

Abstract

The weather and climate domains are undergoing a significant transformation thanks to advances in AI-based foundation models such as FourCastNet, GraphCast, ClimaX and Pangu-Weather. While these models show considerable potential, they are not ready yet for operational use in weather forecasting or climate prediction. This is due to the lack of a data assimilation method as part of their workflow to enable the assimilation of incoming Earth system observations in real time. This limitation affects their effectiveness in predicting complex atmospheric phenomena such as tropical cyclones and atmospheric rivers. To overcome these obstacles, we introduce a generic real-time data assimilation framework and demonstrate its end-to-end performance on the Frontier supercomputer. This framework comprises two primary modules: an ensemble score filter (EnSF), which significantly outperforms the state-of-the-art data assimilation method, namely, the Local Ensemble Transform Kalman Filter (LETKF); and a vision transformer-based surrogate capable of real-time adaptation through the integration of observational data. The ViT surrogate can represent either physics-based models or AI-based foundation models. We demonstrate both the strong and weak scaling of our framework up to 1024 GPUs on the Exascale supercomputer, Frontier. Our results not only illustrate the framework's exceptional scalability on high-performance computing systems, but also demonstrate the importance of supercomputers in real-time data assimilation for weather and climate predictions. Even though the proposed framework is tested only on a benchmark surface quasi-geostrophic (SQG) turbulence system, it has the potential to be combined with existing AI-based foundation models, making it suitable for future operational implementations.
Paper Structure (24 sections, 18 equations, 10 figures, 2 tables)

This paper contains 24 sections, 18 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Illustration of the real-time sequential DA workflow, which needs to be performed very frequently (e.g., every hour) in weather forecast operation. Recent advances in weather and climate modeling focus on developing AI-based foundation models, e.g., FourCastNet, GraphCast, etc., to replace the traditional physics-based forecast models. These data-driven architectures are not yet ready for operational use due to the lack of real-time data assimilation capabilities. The proposed DA framework has two primary modules that need to be scaled on HPC, i.e., the ensemble score filter (EnSF) introduced in Section \ref{['sec:filter']}, which significantly outperforms SOTA methods like LETKF, and a vision transformer(ViT)-based surrogate, introduced in Section \ref{['sec:ViT']}, capable of real-time adaptation through the integration of observational data. Our method can be integrated with either physics models or AI-based foundation models. The scalability of our method on HPC is essential to ensure computations can be performed in real time.
  • Figure 2: Building block of ViT surrogate model for the forecast model in Figure \ref{['fig:workflow']}. The number of parameters and floating point operations (FLOPs) are exemplified with 8-head attention, an embedding dimension of 2048, and a MLP to attention ratio of 8.
  • Figure 3: Computation need in terms of FLOPs and Frontier node hours for training ViT surrogate model for the SQG model on 1M images.
  • Figure 4: The root mean squared errors (RMSEs) of the four test cases. We observe that data assimilation is a necessary component to ensure accurate reconstruction of the SQG state. On the other hand, the RMSE of experiments that only use SQG or ViT without a DA component grows very fast in time. Moreover, LETKF diverges from the ground truth as model errors accumulate in time, suggesting that the LETKF method is sensitive to model imperfections. The proposed EnSF+ViT framework provides superior performance since we observe stable performance throughout all analysis cycles even in the absence of fine tuning.
  • Figure 5: The top row shows the analysis ensemble means from SQG only, ViT only, LETKF+SQG and EnSF+ViT with respect to the ground truth potential temperature field at the final observation time, i.e., $t = 3600$. The analysis mean errors of the four experiments are displayed on the bottom row. We confirm that pure physics-based or AI-based model predictions without data assimilation cannot provide an accurate long-term state reconstruction of the SQG state due to the rapid growth of initial errors in chaotic dynamical systems. The SOTA LETKF method captures the overall large-scale pattern but fails to represent small-scale features. The proposed EnSF+ViT offers the best accuracy, consistent with the RMSE statistics shown in Figure \ref{['Linear_shocks_SQG']}.
  • ...and 5 more figures