Adjoint path-kernel method for backpropagation and data assimilation in unstable diffusions
Angxiu Ni
TL;DR
The paper develops an adjoint path-kernel framework for computing parameter-gradients of discrete-time and continuous-time stochastic systems, including non-hyperbolic dynamics with multiplicative noise. A key feature is a shared main term across many parameters, yielding near-parameter-count-free cost and enabling gradient-based optimization in high dimensions and over long horizons, even when gradients explode. The authors demonstrate the approach on Lorenz-96 with multiplicative noise and integrate it into a challenging 4D-Var data assimilation setting with partial observations and unknown dynamics, solved via stochastic gradient descent. This advances stable long-horizon learning and parameter inference in chaotic diffusion systems and offers practical tools for high-dimensional data assimilation tasks.
Abstract
We derive the adjoint path-kernel method for computing parameter-gradients (linear responses) of SDEs. Its cost is almost independent of the number of parameters, and it works for non-hyperbolic systems with parameter-controlled multiplicative noise. With this new formula, we extend the conventional backpropagation method to settings with gradient explosion, and demonstrate it on the 40-dimensional Lorenz 96 system. Moreover, we consider a difficult version of the 4D-Var data assimilation problem where (1) the deterministic part of the model is chaotic, (2) the loss is a single long-time functional accounting for discrepancies in both the observations and the dynamics, (3) some parameters in the dynamics are unknown, and (4) some coordinates of the states cannot be observed, and cannot be reasonably inferred from other coordinates within a short time. We model the correction term at each time-step separately as a parameterized function of the random state. With our new tool, we can run stochastic gradient descent to find the path and parameters that best match the low-dimensional observation data. We demonstrate this on the 10D Lorenz-96 system with 8D observations.
