Physically Interpretable Representation and Controlled Generation for Turbulence Data
Tiffany Fan, Murray Cutforth, Marta D'Elia, Alexandre Cortiella, Alireza Doostan, Eric Darve
TL;DR
The paper tackles the computational cost of high-fidelity CFD by learning a compact, physically meaningful representation of turbulent flow data. It introduces a Gaussian Mixture Variational Autoencoder (GMVAE) to produce a structured latent space aligned with physical conditions such as the Reynolds number $Re$, complemented by a graph spectral interpretability metric that quantifies smoothness of physical quantities along the latent manifold. On a 2D Navier–Stokes benchmark (flow past a cylinder) across $Re \in [98,2000]$, the GMVAE outperforms Isomap, UMAP, and a standard VAE in clustering quality and latent disentanglement, and enables conditional generation of velocity and pressure fields via a post hoc MLP mapping from $Re$ to latent coordinates. The approach offers a principled, data-driven route for turbulence modeling that preserves physical structure and could support low-fidelity surrogates and cross-modal analyses in engineering systems.
Abstract
Computational Fluid Dynamics (CFD) plays a pivotal role in fluid mechanics, enabling precise simulations of fluid behavior through partial differential equations (PDEs). However, traditional CFD methods are resource-intensive, particularly for high-fidelity simulations of complex flows, which are further complicated by high dimensionality, inherent stochasticity, and limited data availability. This paper addresses these challenges by proposing a data-driven approach that leverages a Gaussian Mixture Variational Autoencoder (GMVAE) to encode high-dimensional scientific data into low-dimensional, physically meaningful representations. The GMVAE learns a structured latent space where data can be categorized based on physical properties such as the Reynolds number while maintaining global physical consistency. To assess the interpretability of the learned representations, we introduce a novel metric based on graph spectral theory, quantifying the smoothness of physical quantities along the latent manifold. We validate our approach using 2D Navier-Stokes simulations of flow past a cylinder over a range of Reynolds numbers. Our results demonstrate that the GMVAE provides improved clustering, meaningful latent structure, and robust generative capabilities compared to baseline dimensionality reduction methods. This framework offers a promising direction for data-driven turbulence modeling and broader applications in computational fluid dynamics and engineering systems.
