Table of Contents
Fetching ...

Auto-differentiable data assimilation: Co-learning of states, dynamics, and filtering algorithms

Melissa Adrian, Daniel Sanz-Alonso, Rebecca Willett

Abstract

Data assimilation algorithms estimate the state of a dynamical system from partial observations, where the successful performance of these algorithms hinges on costly parameter tuning and on employing an accurate model for the dynamics. This paper introduces a framework for jointly learning the state, dynamics, and parameters of filtering algorithms in data assimilation through a process we refer to as auto-differentiable filtering. The framework leverages a theoretically motivated loss function that enables learning from partial, noisy observations via gradient-based optimization using auto-differentiation. We further demonstrate how several well-known data assimilation methods can be learned or tuned within this framework. To underscore the versatility of auto-differentiable filtering, we perform experiments on dynamical systems spanning multiple scientific domains, such as the Clohessy-Wiltshire equations from aerospace engineering, the Lorenz-96 system from atmospheric science, and the generalized Lotka-Volterra equations from systems biology. Finally, we provide guidelines for practitioners to customize our framework according to their observation model, accuracy requirements, and computational budget.

Auto-differentiable data assimilation: Co-learning of states, dynamics, and filtering algorithms

Abstract

Data assimilation algorithms estimate the state of a dynamical system from partial observations, where the successful performance of these algorithms hinges on costly parameter tuning and on employing an accurate model for the dynamics. This paper introduces a framework for jointly learning the state, dynamics, and parameters of filtering algorithms in data assimilation through a process we refer to as auto-differentiable filtering. The framework leverages a theoretically motivated loss function that enables learning from partial, noisy observations via gradient-based optimization using auto-differentiation. We further demonstrate how several well-known data assimilation methods can be learned or tuned within this framework. To underscore the versatility of auto-differentiable filtering, we perform experiments on dynamical systems spanning multiple scientific domains, such as the Clohessy-Wiltshire equations from aerospace engineering, the Lorenz-96 system from atmospheric science, and the generalized Lotka-Volterra equations from systems biology. Finally, we provide guidelines for practitioners to customize our framework according to their observation model, accuracy requirements, and computational budget.
Paper Structure (68 sections, 30 equations, 13 figures, 3 tables, 3 algorithms)

This paper contains 68 sections, 30 equations, 13 figures, 3 tables, 3 algorithms.

Figures (13)

  • Figure 1: Clohessy-Wiltshire parameter estimation (Section \ref{['sec:CW']}). Parameter estimation error of the learned forecast parameter $\hat{\theta}_1$ compared to the ground truth parameter $\theta_1^*=0.0013$ across choices of $\sigma^2_0$ and variance on the diagonals of $R$. The shaded regions correspond to the 0.25 and 0.75 quantiles of errors across 10 independent simulations. We additionally plot the median error for the initialized parameters $\{\theta_{1,j}^{(0)}\}_{j=1}^J$ for $j=1,\dots,J=10$ in the dashed black line (c) representing the errors in the absence of any learning, and the gray region corresponds to the 0.25 and 0.75 quantiles of test forecast RMSEs across these 10 initialized models.
  • Figure 2: Clohessy-Wiltshire filtering (Section \ref{['sec:CW']}). Visualization of observations, true trajectories, and estimated positions for $x_1$ and $x_3$, and velocities $\frac{dx_1}{dt}$ and $\frac{dx_3}{dt}$ across each of the four filtering methods using the learned forecast parameters $\hat{\theta}$ and learned filtering parameters $\hat{\phi}$, evaluated on test data. In this example, the observation noise is fixed at $R = 0.1I_{d_y\times d_y}$, and the variance of the initialization of $\theta_1^{(0)}$ is fixed at $\sigma_0^2 = 0.03$.
  • Figure 3: Clohessy-Wiltshire data log-likelihood (Section \ref{['sec:CW']}). Plots of the forecast log-likelihood evaluated at observations from \ref{['eq:forecast_LL']} for assimilating test observations into filters parameterized by the learned forecast parameters $\hat{\theta}$ and $\hat{\phi}$ learned from AD-3DVar-$C$, AD-EnKF ($N=25$), and AD-Ens3DVar ($N=25$). The algorithm used for filtering corresponds to the algorithm used in the learning task (i.e., EnKF is used as the filter with the learned parameters from AD-EnKF). The log-likelihood from the optimal solution to the filtering task, the Kalman filter (detailed in Algorithm \ref{['alg:CW_KF']}) using the true forecast parameters $\hat{\theta}$, is shown in each plot as a comparison.
  • Figure 4: Lorenz-96 learned forecast performance (Section \ref{['sec:L96']}). Test performance of the learned Lorenz-96 model in \ref{['eq:l96']} learned from various methods: AD-3DVar-$C$, AD-3DVar-$K$, AD-EnKF with a covariance tapering radius of 5 and $N=25$, and AD-Ens3DVar with a covariance tapering radius of 5 and $N=25$. The shaded regions correspond to the 0.2 and 0.8 quantiles of test forecast performances across 10 independent simulations. In the plot varying $d_y/d_x$, we additionally plot the median forecast RMSE for the initialized models $\mathcal{F}_{\beta_j}$ for $j=1,\dots,J=10$ in the dashed black line (c), and the gray region corresponds to the 0.2 and 0.8 quantiles of test forecast RMSEs across these 10 initialized models.
  • Figure 5: Lorenz-96 filtering (Section \ref{['sec:L96']}). Visualization of filtering on test observation data using the learned $\hat{\theta}$ and $\hat{\phi}$ values across the methods AD-3DVar-$C$, AD-3DVar-$K$, AD-EnKF with $N=25$ ensemble members and a tapering radius of 5, and AD-Ens3DVar with $N=25$ ensemble members and a tapering radius of 5 compared to the noisy observations and ground truth Lorenz-96 simulations. In this plot, the settings of the filtering problem are $R=I_{d_y\times d_x}$, $d_y/d_x=0.6$, $\sigma^2_0=1$, and $d_x=200$. For ease of visualization, only the first 50 out of 200 state components are shown in the plot. Filtering is done with the base-filtering algorithm used during training (i.e., the results presented for AD-EnKF use the learned $\hat{\theta}$ and $\hat{\phi}$ in an EnKF algorithm).
  • ...and 8 more figures