$Φ$-DVAE: Physics-Informed Dynamical Variational Autoencoders for Unstructured Data Assimilation
Alex Glyn-Davies, Connor Duffin, Ö. Deniz Akyildiz, Mark Girolami
TL;DR
Φ-DVAE addresses the challenge of incorporating unstructured data into physics-based models by marrying a data-driven VAE encoder with a physics-informed latent state-space model governed by discretized dynamics of $\mathbf{u}_n$ and unknown parameters $\mathbf{\Lambda}$. The approach uses a variational Bayesian framework to jointly infer latent encodings $\mathbf{x}_{1:N}$, latent states $\mathbf{u}_{1:N}$, and $\mathbf{\Lambda}$, leveraging a discretized stochastic PDE (statFEM) as the latent dynamics and a pseudo-observation model linking $\mathbf{u}_n$ to $\mathbf{x}_n$ via $\mathbf{x}_n = \mathbf{H}\mathbf{u}_n + \mathbf{r}_n$. Inference combines a VAE encoder $q_\phi(\mathbf{x}|\mathbf{y})$ with an extended Kalman-type filter (ExKF) for the latent states, and learning optimises encoder/decoder parameters $\phi,\theta$ and a variational posterior $q_\lambda(\mathbf{\Lambda})$ over the physics parameters, yielding data-efficient encodings and uncertainty quantification. Experiments on Advection, Lorenz-63, and KdV demonstrate accurate latent state estimation, parameter recovery with credible intervals, and robust future predictions, often outperforming a baseline KVAE due to the physics-informed latent dynamics. The framework thus offers a principled, Bayesian route for unstructured data assimilation in complex dynamical systems with unknown observation operators.
Abstract
Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the mapping from data-space to model-space is unknown. To address these shortcomings, in this paper we develop a physics-informed dynamical variational autoencoder ($Φ$-DVAE) to embed diverse data streams into time-evolving physical systems described by differential equations. Our approach combines a standard, possibly nonlinear, filter for the latent state-space model and a VAE, to assimilate the unstructured data into the latent dynamical system. Unstructured data, in our example systems, comes in the form of video data and velocity field measurements, however the methodology is suitably generic to allow for arbitrary unknown observation operators. A variational Bayesian framework is used for the joint estimation of the encoding, latent states, and unknown system parameters. To demonstrate the method, we provide case studies with the Lorenz-63 ordinary differential equation, and the advection and Korteweg-de Vries partial differential equations. Our results, with synthetic data, show that $Φ$-DVAE provides a data efficient dynamics encoding methodology which is competitive with standard approaches. Unknown parameters are recovered with uncertainty quantification, and unseen data are accurately predicted.
