Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet

Melissa Adrian; Daniel Sanz-Alonso; Rebecca Willett

Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet

Melissa Adrian, Daniel Sanz-Alonso, Rebecca Willett

TL;DR

This study demonstrates that online data assimilation using a machine-learning weather surrogate (FourCastNet) within a 3DVar framework can yield stable, high-quality analyses over year-long horizons despite long-term surrogate instability and sparse, noisy observations. It provides a theoretical long-time accuracy bound, showing that short-term surrogate accuracy suffices when observations are sufficiently informative. Empirically, 3DVar analyses offer better initialization for forecasting than naive observation interpolation and can effectively support extreme-event prediction, as illustrated by Typhoon Mawar. The results suggest substantial practical potential for combining fast ML surrogates with variational data assimilation to enable accurate, real-time, large-scale weather analyses and forecasts at reduced computational cost.

Abstract

Modern data-driven surrogate models for weather forecasting provide accurate short-term predictions but inaccurate and nonphysical long-term forecasts. This paper investigates online weather prediction using machine learning surrogates supplemented with partial and noisy observations. We empirically demonstrate and theoretically justify that, despite the long-time instability of the surrogates and the sparsity of the observations, filtering estimates can remain accurate in the long-time horizon. As a case study, we integrate FourCastNet, a weather surrogate model, within a variational data assimilation framework using partial, noisy ERA5 data. Our results show that filtering estimates remain accurate over a year-long assimilation window and provide effective initial conditions for forecasting tasks, including extreme event prediction.

Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet

TL;DR

Abstract

Paper Structure (40 sections, 1 theorem, 27 equations, 21 figures, 1 table)

This paper contains 40 sections, 1 theorem, 27 equations, 21 figures, 1 table.

Introduction
Contributions
Related work
Data-driven weather forecasting in data assimilation
Extreme event forecasting using data-driven weather surrogates.
Stability theory of 3DVar accuracy.
Data description
ECMWF Reanalysis v5 (ERA5)
Ground truth states
Observations $y_t$.
High-resolution forecasts (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF)'s Integrated Forecasting System (IFS)
Observational typhoon data from the International Best Track Archive for Climate Stewardship (IBTrACS)
Methodology
Setting
Observation operator $H$.
...and 25 more sections

Key Result

Theorem 1

Suppose Assumption assumption:obs holds. Additionally suppose that the Kalman gain matrix $K$ in eq:surrogate_3dvar satisfies that, for some constant $\lambda \in (0,1),$ where $D\mathcal{F}$ denotes the Jacobian matrix of $\mathcal{F}$. Suppose further that Then, there exists a constant $c>0$ independent of $\gamma,$$\lambda,$ and $\epsilon$ such that the surrogate 3DVar algorithm satisfies

Figures (21)

Figure 1: The dotted lines in both (a) and (b) correspond to metrics for interpolated noisy observations at each time point, and solid lines correspond to metrics for the 3DVar analyses. These metrics are computed using standardized ERA5 data and standardized predictions, and the results are reported as average standardized errors across our 20 atmospheric features. These results show that our 3DVar analyses yield lower RMSE and higher ACC metrics across a year compared to interpolating raw observations. Furthermore, our 3DVar analyses using low-resolution observations achieve stable metrics up to a 5$^\circ$ resolution. At the $5^\circ$ observation resolution, the analysis can be unstable, and we display metrics only up to the time that the instability was detected.
Figure 2: Visualization of the ground truth ERA5 data, interpolated $4.5^\circ$ ERA5 observations with standardized $N(0,0.0001 I_{d_y})$ distributed additive errors, and our 3DVar analysis using this observational data and FourCastNet for the atmospheric features total column water vapor (TCWV), U-component wind speed at 10m above the surface (U 10m), and relative humidity at 500 hPa (RH 500hPa) at the end of our assimilation horizon, December 31, 2023 at 18:00 UTC.
Figure 3: Visualization of various forecasting initializations for the task of $h$-step-ahead forecasting. An initialization at time $t'$ is used to autoregressively compute forecasts up to $h$ time steps ahead using $\mathcal{F}_\text{FCN}$, FourCastNet. The initialization time $t'$ varies between $1\leq t'\leq T-h$ for all tasks (a)-(d). We consider forecasting using (a) interpolated observations, (b) true ERA5 data (unavailable in practice, serving here as an idealized setting), (c) 3DVar analyses, and (d) climatology as initializations. Additionally, $t=0$ corresponds to January 1, 2023 at 00:00 UTC, and $t=T$ corresponds to December 31, 2023 at 18:00 UTC.
Figure 4: Plots of the 120 hour forecasting performance using (a) interpolated $4.5^\circ$ observations, (b) ground truth ERA5 data, (c) 3DVar analyses with $4.5^\circ$ observation resolution, and (d) climatology as initializations. Each line corresponds to the performance at each forecasting horizon in 6 hour increments averaged across different time points for the initial conditions. The shaded regions correspond to the 0.05 and 0.95 quantiles of the forecasting metrics at each forecasting horizon. We also plot the $t=0$ errors, which corresponds to the initialization error prior to forecasting.
Figure 5: Visualization of FourCastNet's 7 day forecast of the estimated eye of Typhoon Mawar initialized on May 23, 2023 00:00 UTC using three different initial conditions: ground truth ERA5 data as an ideal setting (left), our 3DVar analysis using $4.5^\circ$ noisy observations (middle), and interpolated $4.5^\circ$ noisy observations (right). Each standardized initialization is perturbed by $\mathcal{N}(0,0.3 I_{d_x})$ noise to create a 50 member ensemble. These initial ensemble members were then independently propagated forward in time using FourCastNet without any additional data to correct these forecasts. For comparison, we include the eye of the typhoon based on ERA5, a single IFS-HRES forecast, and IBTrACS observational data in each plot. The skill of the ensemble in predicting the typhoon's trajectory based on IBTrACS data using the CRPS metric is listed at the top of each image showing the forecast ensembles.
...and 16 more figures

Theorems & Definitions (2)

Theorem 1
proof

Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet

TL;DR

Abstract

Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (21)

Theorems & Definitions (2)