Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet
Melissa Adrian, Daniel Sanz-Alonso, Rebecca Willett
TL;DR
This study demonstrates that online data assimilation using a machine-learning weather surrogate (FourCastNet) within a 3DVar framework can yield stable, high-quality analyses over year-long horizons despite long-term surrogate instability and sparse, noisy observations. It provides a theoretical long-time accuracy bound, showing that short-term surrogate accuracy suffices when observations are sufficiently informative. Empirically, 3DVar analyses offer better initialization for forecasting than naive observation interpolation and can effectively support extreme-event prediction, as illustrated by Typhoon Mawar. The results suggest substantial practical potential for combining fast ML surrogates with variational data assimilation to enable accurate, real-time, large-scale weather analyses and forecasts at reduced computational cost.
Abstract
Modern data-driven surrogate models for weather forecasting provide accurate short-term predictions but inaccurate and nonphysical long-term forecasts. This paper investigates online weather prediction using machine learning surrogates supplemented with partial and noisy observations. We empirically demonstrate and theoretically justify that, despite the long-time instability of the surrogates and the sparsity of the observations, filtering estimates can remain accurate in the long-time horizon. As a case study, we integrate FourCastNet, a weather surrogate model, within a variational data assimilation framework using partial, noisy ERA5 data. Our results show that filtering estimates remain accurate over a year-long assimilation window and provide effective initial conditions for forecasting tasks, including extreme event prediction.
