4D-Var using Hessian approximation and backpropagation applied to automatically-differentiable numerical and machine learning models

Kylen Solvik; Stephen G. Penny; Stephan Hoyer

4D-Var using Hessian approximation and backpropagation applied to automatically-differentiable numerical and machine learning models

Kylen Solvik, Stephen G. Penny, Stephan Hoyer

TL;DR

The paper addresses the practical bottleneck of 4D-Var data assimilation, which traditionally relies on tangential linear models and adjoint models, by introducing Backprop-4DVar that leverages a Gauss-Newton-like Hessian approximation and backpropagation within automatic differentiation frameworks. It demonstrates that Backprop-4DVar can be applied to any differentiable forecast model, including ML surrogates, using JAX for gradient and Hessian computations, thereby simplifying implementation and reducing compute costs. Through experiments on Lorenz-96 and two-layer quasi-geostrophic dynamics (including a reservoir computing surrogate), the method achieves RMSE comparable to standard 4D-Var while delivering substantial speedups, with near-linear scaling as system size increases. The work highlights practical guidelines for learning-rate tuning via Bayesian optimization and showcases the potential for deeper integration of differentiable modeling and data assimilation in next-generation weather forecasting systems.

Abstract

Constraining a numerical weather prediction (NWP) model with observations via 4D variational (4D-Var) data assimilation is often difficult to implement in practice due to the need to develop and maintain a software-based tangent linear model and adjoint model. One of the most common 4D-Var algorithms uses an incremental update procedure, which has been shown to be an approximation of the Gauss-Newton method. Here we demonstrate that when using a forecast model that supports automatic differentiation, an efficient and in some cases more accurate alternative approximation of the Gauss-Newton method can be applied by combining backpropagation of errors with Hessian approximation. This approach can be used with either a conventional numerical model implemented within a software framework that supports automatic differentiation, or a machine learning (ML) based surrogate model. We test the new approach on a variety of Lorenz-96 and quasi-geostrophic models. The results indicate potential for a deeper integration of modeling, data assimilation, and new technologies in a next-generation of operational forecast systems that leverage weather models designed to support automatic differentiation.

4D-Var using Hessian approximation and backpropagation applied to automatically-differentiable numerical and machine learning models

TL;DR

Abstract

Paper Structure (17 sections, 21 equations, 7 figures)

This paper contains 17 sections, 21 equations, 7 figures.

Introduction
Methods
Data Models
Forecast Models
4D-Var Data Assimilation
The Gauss-Newton method
Automatic differentiation
Backprop-4DVar Data Assimilation
Experiment Design
Data
Results
Lorenz 96
Two-Layer Quasi-geostrophic model implemented in PyQG-Jax
Quasi-geostrophic dynamics emulated with a Reservoir Computing Model
Learning Rate Tuning
...and 2 more sections

Figures (7)

Figure 1: (Top left) An example of the evolution of the 36-dimensional Lorenz-96 system over 600 time steps ($\Delta t=0.01$), or 30 days if one model time unit (MTU) corresponds to 5 days. The nature run is shown with observation locations at their appropriate times as white dots. (Top right) A baseline run without data assimilation shows the exponential error growth that occurs when the model is initialized from imperfect initial conditions. (Bottom left) the conventional incremental 4D-Var is used to assimilate the observations to reconstruct a state estimate that remains close to the nature run. (Bottom right) The Backprop-4DVar produces state estimates very close to the 4D-Var reconstruction.
Figure 2: A comparison of RMSE for the conventional incremental 4D-Var and Backprop-4DVar with varying observation coverage (x-axis) and observation noise (y-axis). A total of 1620 experiment results are summarized here, with 30 trials run for each combination of observation number and noise (i.e. each cell in the heatmap). The mean of the difference in RMSE, normalized by dividing by the 4D-Var RMSE, is shown in panel (c). standard deviation of differences in RMSE are shown in panels 3 and 4. Panel (c) shows the mean relative RMSE difference between the two, calculated by from dividing the absolute difference by the 4D-Var RMSE. Paired Student's t-tests are run for each cell using $\alpha = 0.01$. Cases where $p > \alpha$ and thus do not pass the conditions of the test are greyed out, while colored cells represent cases where $p < \alpha$. Positive values, shown in blue, indicate that on average Backprop-4DVar has a lower RMSE than 4D-Var. Negative values in red indicate that 4D-Var has a lower RMSE than Backprop-4DVar. The cell outlined in bold black has the same experimental configuration (36D with 18 observations and observation noise standard deviation of 0.5) as the 36D experiment shown in Figure \ref{['l96_timecomp']}
Figure 3: (a) Comparison of 4D-Var and Backprop-4DVar runtimes, and (b) RMSE with increasing Lorenz 96 system size. The mean of results from 50 isolated trials with randomized initial conditions and observations are shown with $95\%$ confidence intervals shaded. For reference, the grey line highlights the experiment with the same configuration as the black-outlined cell in Figure \ref{['l96_heatmap']} (36D with 18 observations and observation noise standard deviation of 0.5)
Figure 4: PyQG experiment contour plots for the bottom layer of a 2048D system (2x32x32 in gridded space) at 3 times during the test period. The top row (a-c) shows the nature run and the observation locations in white, which are randomly sampled at half of the grid locations at every 3rd timestep. The next two rows show the error vs. the nature run for a baseline model run with no data assimilation (d-f) and for 4D-Var (g-i). Finally, the two variants of Backprop-4DVar---with the exact Hessian (j-l) and the approximate Hessian (m-o)---are shown in comparison to 4D-Var. Purple and orange represent lower and higher absolute error than 4D-Var respectively.
Figure 5: (a) Run times (log scale) and (b) RMSE for the QG dynamics using the PyQG-JAX forecast model, for 4D-Var and Backprop-4DVar. An unconstrained free run without data assimilation is provided as a baseline for comparison. While all three DA methods show similar performance in terms of RMSE, Backprop-4DVar using the approximate Hessian (green) is an order of magnitude faster than the reference methods.
...and 2 more figures

4D-Var using Hessian approximation and backpropagation applied to automatically-differentiable numerical and machine learning models

TL;DR

Abstract

4D-Var using Hessian approximation and backpropagation applied to automatically-differentiable numerical and machine learning models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)