Data assimilation and discrepancy modeling with shallow recurrent decoders
Yuxuan Bao, J. Nathan Kutz
TL;DR
The paper tackles the challenge of closing the simulation-to-reality (SIM2REAL) gap in high-dimensional, spatiotemporal systems with sparse sensors. It introduces DA-SHRED, a hybrid framework that leverages a SHRED-derived latent space trained on simulations, refines it with real sensor data, and uses SINDy to discover missing physics L' in the latent dynamics. Through demonstrations on 2D damped Kuramoto–Sivashinsky, 2D Kolmogorov flow, Gray–Scott reaction–diffusion, and rotating detonation engines, DA-SHRED achieves rapid convergence and accurate state reconstruction while recovering physically meaningful discrepancy terms. The approach combines temporal encoding, sparse sensing, and interpretable discrepancy modeling, offering a data-efficient path toward real-time assimilation and physics-informed correction in complex systems, with future extensions to adaptive bases and multiscale or stochastic dynamics.
Abstract
The requirements of modern sensing are rapidly evolving, driven by increasing demands for data efficiency, real-time processing, and deployment under limited sensing coverage. Complex physical systems are often characterized through the integration of a limited number of point sensors in combination with scientific computations which approximate the dominant, full-state dynamics. Simulation models, however, inevitably neglect small-scale or hidden processes, are sensitive to perturbations, or oversimplify parameter correlations, leading to reconstructions that often diverge from the reality measured by sensors. This creates a critical need for data assimilation, the process of integrating observational data with predictive simulation models to produce coherent and accurate estimates of the full state of complex physical systems. We propose a machine learning framework for Data Assimilation with a SHallow REcurrent Decoder (DA-SHRED) which bridges the simulation-to-real (SIM2REAL) gap between computational modeling and experimental sensor data. For real-world physics systems modeling high-dimensional spatiotemporal fields, where the full state cannot be directly observed and must be inferred from sparse sensor measurements, we leverage the latent space learned from a reduced simulation model via SHRED, and update these latent variables using real sensor data to accurately reconstruct the full system state. Furthermore, our algorithm incorporates a sparse identification of nonlinear dynamics based regression model in the latent space to identify functionals corresponding to missing dynamics in the simulation model. We demonstrate that DA-SHRED successfully closes the SIM2REAL gap and additionally recovers missing dynamics in highly complex systems, demonstrating that the combination of efficient temporal encoding and physics-informed correction enables robust data assimilation.
