Table of Contents
Fetching ...

$Φ$-DVAE: Physics-Informed Dynamical Variational Autoencoders for Unstructured Data Assimilation

Alex Glyn-Davies, Connor Duffin, Ö. Deniz Akyildiz, Mark Girolami

TL;DR

Φ-DVAE addresses the challenge of incorporating unstructured data into physics-based models by marrying a data-driven VAE encoder with a physics-informed latent state-space model governed by discretized dynamics of $\mathbf{u}_n$ and unknown parameters $\mathbf{\Lambda}$. The approach uses a variational Bayesian framework to jointly infer latent encodings $\mathbf{x}_{1:N}$, latent states $\mathbf{u}_{1:N}$, and $\mathbf{\Lambda}$, leveraging a discretized stochastic PDE (statFEM) as the latent dynamics and a pseudo-observation model linking $\mathbf{u}_n$ to $\mathbf{x}_n$ via $\mathbf{x}_n = \mathbf{H}\mathbf{u}_n + \mathbf{r}_n$. Inference combines a VAE encoder $q_\phi(\mathbf{x}|\mathbf{y})$ with an extended Kalman-type filter (ExKF) for the latent states, and learning optimises encoder/decoder parameters $\phi,\theta$ and a variational posterior $q_\lambda(\mathbf{\Lambda})$ over the physics parameters, yielding data-efficient encodings and uncertainty quantification. Experiments on Advection, Lorenz-63, and KdV demonstrate accurate latent state estimation, parameter recovery with credible intervals, and robust future predictions, often outperforming a baseline KVAE due to the physics-informed latent dynamics. The framework thus offers a principled, Bayesian route for unstructured data assimilation in complex dynamical systems with unknown observation operators.

Abstract

Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the mapping from data-space to model-space is unknown. To address these shortcomings, in this paper we develop a physics-informed dynamical variational autoencoder ($Φ$-DVAE) to embed diverse data streams into time-evolving physical systems described by differential equations. Our approach combines a standard, possibly nonlinear, filter for the latent state-space model and a VAE, to assimilate the unstructured data into the latent dynamical system. Unstructured data, in our example systems, comes in the form of video data and velocity field measurements, however the methodology is suitably generic to allow for arbitrary unknown observation operators. A variational Bayesian framework is used for the joint estimation of the encoding, latent states, and unknown system parameters. To demonstrate the method, we provide case studies with the Lorenz-63 ordinary differential equation, and the advection and Korteweg-de Vries partial differential equations. Our results, with synthetic data, show that $Φ$-DVAE provides a data efficient dynamics encoding methodology which is competitive with standard approaches. Unknown parameters are recovered with uncertainty quantification, and unseen data are accurately predicted.

$Φ$-DVAE: Physics-Informed Dynamical Variational Autoencoders for Unstructured Data Assimilation

TL;DR

Φ-DVAE addresses the challenge of incorporating unstructured data into physics-based models by marrying a data-driven VAE encoder with a physics-informed latent state-space model governed by discretized dynamics of and unknown parameters . The approach uses a variational Bayesian framework to jointly infer latent encodings , latent states , and , leveraging a discretized stochastic PDE (statFEM) as the latent dynamics and a pseudo-observation model linking to via . Inference combines a VAE encoder with an extended Kalman-type filter (ExKF) for the latent states, and learning optimises encoder/decoder parameters and a variational posterior over the physics parameters, yielding data-efficient encodings and uncertainty quantification. Experiments on Advection, Lorenz-63, and KdV demonstrate accurate latent state estimation, parameter recovery with credible intervals, and robust future predictions, often outperforming a baseline KVAE due to the physics-informed latent dynamics. The framework thus offers a principled, Bayesian route for unstructured data assimilation in complex dynamical systems with unknown observation operators.

Abstract

Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the mapping from data-space to model-space is unknown. To address these shortcomings, in this paper we develop a physics-informed dynamical variational autoencoder (-DVAE) to embed diverse data streams into time-evolving physical systems described by differential equations. Our approach combines a standard, possibly nonlinear, filter for the latent state-space model and a VAE, to assimilate the unstructured data into the latent dynamical system. Unstructured data, in our example systems, comes in the form of video data and velocity field measurements, however the methodology is suitably generic to allow for arbitrary unknown observation operators. A variational Bayesian framework is used for the joint estimation of the encoding, latent states, and unknown system parameters. To demonstrate the method, we provide case studies with the Lorenz-63 ordinary differential equation, and the advection and Korteweg-de Vries partial differential equations. Our results, with synthetic data, show that -DVAE provides a data efficient dynamics encoding methodology which is competitive with standard approaches. Unknown parameters are recovered with uncertainty quantification, and unseen data are accurately predicted.
Paper Structure (18 sections, 51 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 51 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: An illustration of the $\Phi$-DVAE model. On the left, the video frames are seen, denoted ${\mathbf{y}}_{1:N}$. These are converted into physically interpretable low-dimensional encodings ${\mathbf{x}}_{1:N}$ using an encoder. The learning is informed by the physics-driven state-space model, which treats ${\mathbf{x}}_{1:N}$ as pseudo-observations (bottom right). These pseudo-observations are used to infer the latent states ${\mathbf{u}}_{1:N}$.
  • Figure 2: Flow diagram describing connections between the specified *PDE, latent *SSM and *VAE. Inputs to the method include the unstructured data ${\mathbf{y}}_{1:N}$, the specified PDE $u(x, s)$ and the model parameter prior (solid grey fill). Outputs are the trained autoencoder parameters $\left\{\phi^\star, \theta^\star\right\}$, and the model parameter posterior distribution parameters $\lambda^\star$ (striped grey fill). Labels are included referencing the relevant sections.
  • Figure 3: Comparison of KVAE and PIDVAE for the advection equation.
  • Figure 4: Lorenz-63: latent states ${\mathbf{u}}_{1:N}$, pseudo-observations ${\mathbf{x}}_{1:N}$, and velocity field ${\mathbf{y}}_N$.
  • Figure 5: Left: Ground truth pseudo-observations (dashed) compared to encoding ${\mathbf{x}}_{1:N} \sim q_{\phi^{\star}}(\cdot | {\mathbf{y}}_{1:N})$ (solid). Training data indicated by grey-fill ($t=\left[ 0, 2\right)$), red lines indicate reconstruction times. Right: Observed data ${\mathbf{y}}_n$ plotted alongside reconstructions $\hat{{\mathbf{y}}}_{n}$, with the velocity field shown as a streamplot where the color indicates the speed of the flow.
  • ...and 4 more figures