Table of Contents
Fetching ...

Jacobian-Enforced Neural Networks (JENN) for Improved Data Assimilation Consistency in Dynamical Models

Xiaoxu Tian

TL;DR

This work tackles the gap between ML-based dynamical emulators and data assimilation systems caused by the lack of explicit sensitivity representations. It introduces the Jacobian-Enforced Neural Network (JENN), which adds tangent linear and adjoint information to the training loss via $L_{forecast}$, $L_{TLM}$, and $L_{ADJ}$, optimized in two stages. Using the Lorenz96 model as a testbed, JENN preserves nonlinear forecast skill while substantially improving TL/AD fidelity and the Jacobian $J$, enabling more reliable data assimilation. The framework is compatible with pretrained models (e.g., GraphCast, NeuralGCM, Pangu, FuXi) and offers a practical pathway to integrating ML emulators into operational data assimilation pipelines.

Abstract

Machine learning-based weather models have shown great promise in producing accurate forecasts but have struggled when applied to data assimilation tasks, unlike traditional numerical weather prediction (NWP) models. This study introduces the Jacobian-Enforced Neural Network (JENN) framework, designed to enhance DA consistency in neural network (NN)-emulated dynamical systems. Using the Lorenz 96 model as an example, the approach demonstrates improved applicability of NNs in DA through explicit enforcement of Jacobian relationships. The NN architecture includes an input layer of 40 neurons, two hidden layers with 256 units each employing hyperbolic tangent activation functions, and an output layer of 40 neurons without activation. The JENN framework employs a two-step training process: an initial phase using standard prediction-label pairs to establish baseline forecast capability, followed by a secondary phase incorporating a customized loss function to enforce accurate Jacobian relationships. This loss function combines root mean square error (RMSE) between predicted and true state values with additional RMSE terms for tangent linear (TL) and adjoint (AD) emulation results, weighted to balance forecast accuracy and Jacobian sensitivity. To ensure consistency, the secondary training phase uses additional pairs of TL/AD inputs and labels calculated from the physical models. Notably, this approach does not require starting from scratch or structural modifications to the NN, making it readily applicable to pretrained models such as GraphCast, NeuralGCM, Pangu, or FuXi, facilitating their adaptation for DA tasks with minimal reconfiguration. Experimental results demonstrate that the JENN framework preserves nonlinear forecast performance while significantly reducing noise in the TL and AD components, as well as in the overall Jacobian matrix.

Jacobian-Enforced Neural Networks (JENN) for Improved Data Assimilation Consistency in Dynamical Models

TL;DR

This work tackles the gap between ML-based dynamical emulators and data assimilation systems caused by the lack of explicit sensitivity representations. It introduces the Jacobian-Enforced Neural Network (JENN), which adds tangent linear and adjoint information to the training loss via , , and , optimized in two stages. Using the Lorenz96 model as a testbed, JENN preserves nonlinear forecast skill while substantially improving TL/AD fidelity and the Jacobian , enabling more reliable data assimilation. The framework is compatible with pretrained models (e.g., GraphCast, NeuralGCM, Pangu, FuXi) and offers a practical pathway to integrating ML emulators into operational data assimilation pipelines.

Abstract

Machine learning-based weather models have shown great promise in producing accurate forecasts but have struggled when applied to data assimilation tasks, unlike traditional numerical weather prediction (NWP) models. This study introduces the Jacobian-Enforced Neural Network (JENN) framework, designed to enhance DA consistency in neural network (NN)-emulated dynamical systems. Using the Lorenz 96 model as an example, the approach demonstrates improved applicability of NNs in DA through explicit enforcement of Jacobian relationships. The NN architecture includes an input layer of 40 neurons, two hidden layers with 256 units each employing hyperbolic tangent activation functions, and an output layer of 40 neurons without activation. The JENN framework employs a two-step training process: an initial phase using standard prediction-label pairs to establish baseline forecast capability, followed by a secondary phase incorporating a customized loss function to enforce accurate Jacobian relationships. This loss function combines root mean square error (RMSE) between predicted and true state values with additional RMSE terms for tangent linear (TL) and adjoint (AD) emulation results, weighted to balance forecast accuracy and Jacobian sensitivity. To ensure consistency, the secondary training phase uses additional pairs of TL/AD inputs and labels calculated from the physical models. Notably, this approach does not require starting from scratch or structural modifications to the NN, making it readily applicable to pretrained models such as GraphCast, NeuralGCM, Pangu, or FuXi, facilitating their adaptation for DA tasks with minimal reconfiguration. Experimental results demonstrate that the JENN framework preserves nonlinear forecast performance while significantly reducing noise in the TL and AD components, as well as in the overall Jacobian matrix.

Paper Structure

This paper contains 6 sections, 4 equations, 5 figures.

Figures (5)

  • Figure 1: An illustration of neural network structure to emulate the Lorenz 96 model, with an input layer of 40 nodes, two hidden layers both of 256 nodes, and an output layer of 40 nodes. The diagram highlights three key data flows: the nonlinear forward pass (orange), the tangent linear propagation (blue), and the backward adjoint propagation (green). Each stream contributes to the total loss function, combining nonlinear forecast loss, tangent linear loss, and adjoint loss.
  • Figure 2: An illustration of neural network structure to emulate the Lorenz 96 model, with an input layer of 40 nodes, two hidden layers both of 256 nodes, and an output layer of 40 nodes. The diagram highlights three key data flows: the nonlinear forward pass (orange), the tangent linear propagation (blue), and the backward adjoint propagation (green). Each stream contributes to the total loss function, combining nonlinear forecast loss, tangent linear loss, and adjoint loss.
  • Figure 3: An illustration of neural network structure to emulate the Lorenz 96 model, with an input layer of 40 nodes, two hidden layers both of 256 nodes, and an output layer of 40 nodes. The diagram highlights three key data flows: the nonlinear forward pass (orange), the tangent linear propagation (blue), and the backward adjoint propagation (green). Each stream contributes to the total loss function, combining nonlinear forecast loss, tangent linear loss, and adjoint loss.
  • Figure 4: An illustration of neural network structure to emulate the Lorenz 96 model, with an input layer of 40 nodes, two hidden layers both of 256 nodes, and an output layer of 40 nodes. The diagram highlights three key data flows: the nonlinear forward pass (orange), the tangent linear propagation (blue), and the backward adjoint propagation (green). Each stream contributes to the total loss function, combining nonlinear forecast loss, tangent linear loss, and adjoint loss.
  • Figure 5: An illustration of neural network structure to emulate the Lorenz 96 model, with an input layer of 40 nodes, two hidden layers both of 256 nodes, and an output layer of 40 nodes. The diagram highlights three key data flows: the nonlinear forward pass (orange), the tangent linear propagation (blue), and the backward adjoint propagation (green). Each stream contributes to the total loss function, combining nonlinear forecast loss, tangent linear loss, and adjoint loss.