Table of Contents
Fetching ...

Comparison of Generative Learning Methods for Turbulence Surrogates

Claudia Drygala, Edmund Ross, Francesca di Mare, Hanno Gottschalk

TL;DR

The paper tackles the challenge of efficiently representing turbulent flows by benchmarking three generative models—$VAE$, $DCGAN$, and $DDPM$—as surrogates for turbulence. It applies these models to 2D wake flows from a circular cylinder, using high-fidelity LES data and experimental PIV data to assess statistical fidelity and spatial structure preservation. The key finding is that both $DCGAN$ and $DDPM$ can reproduce LES-like flow fields, with $DCGAN$ offering superior data efficiency, faster training and inference, and closer alignment with input streams, while $VAE$ underperforms and $DDPM$ incurs higher computational costs. This work demonstrates the practicality of GAN-based turbulence surrogates for rapid data generation and uncertainty quantification, and it suggests diffusion models as a slower but still viable alternative, potentially extendable to broader experimental datasets and higher Reynolds numbers.

Abstract

Numerical simulations of turbulent flows present significant challenges in fluid dynamics due to their complexity and high computational cost. High resolution techniques such as Direct Numerical Simulation (DNS) and Large Eddy Simulation (LES) are generally not computationally affordable, particularly for technologically relevant problems. Recent advances in machine learning, specifically in generative probabilistic models, offer promising alternatives as surrogates for turbulence. This paper investigates the application of three generative models - Variational Autoencoders (VAE), Deep Convolutional Generative Adversarial Networks (DCGAN), and Denoising Diffusion Probabilistic Models (DDPM) - in simulating a von Kármán vortex street around a fixed cylinder projected into 2D, as well as a real-world experimental dataset of the wake flow of a cylinder array. Training data was obtained by means of LES in the simulated case and Particle Image Velocimetry (PIV) in the experimental case. We evaluate each model's ability to capture the statistical properties and spatial structures of the turbulent flow. Our results demonstrate that DDPM and DCGAN effectively replicate all flow distributions, highlighting their potential as efficient and accurate tools for turbulence surrogacy. We find a strong argument for DCGAN, as although they are more difficult to train (due to problems such as mode collapse), they show the fastest inference and training time, require less data to train compared to VAE and DDPM, and provide the results most closely aligned with the input stream. In contrast, VAE train quickly (and can generate samples quickly) but do not produce adequate results, and DDPM, whilst effective, are significantly slower at both, inference and training time.

Comparison of Generative Learning Methods for Turbulence Surrogates

TL;DR

The paper tackles the challenge of efficiently representing turbulent flows by benchmarking three generative models—, , and —as surrogates for turbulence. It applies these models to 2D wake flows from a circular cylinder, using high-fidelity LES data and experimental PIV data to assess statistical fidelity and spatial structure preservation. The key finding is that both and can reproduce LES-like flow fields, with offering superior data efficiency, faster training and inference, and closer alignment with input streams, while underperforms and incurs higher computational costs. This work demonstrates the practicality of GAN-based turbulence surrogates for rapid data generation and uncertainty quantification, and it suggests diffusion models as a slower but still viable alternative, potentially extendable to broader experimental datasets and higher Reynolds numbers.

Abstract

Numerical simulations of turbulent flows present significant challenges in fluid dynamics due to their complexity and high computational cost. High resolution techniques such as Direct Numerical Simulation (DNS) and Large Eddy Simulation (LES) are generally not computationally affordable, particularly for technologically relevant problems. Recent advances in machine learning, specifically in generative probabilistic models, offer promising alternatives as surrogates for turbulence. This paper investigates the application of three generative models - Variational Autoencoders (VAE), Deep Convolutional Generative Adversarial Networks (DCGAN), and Denoising Diffusion Probabilistic Models (DDPM) - in simulating a von Kármán vortex street around a fixed cylinder projected into 2D, as well as a real-world experimental dataset of the wake flow of a cylinder array. Training data was obtained by means of LES in the simulated case and Particle Image Velocimetry (PIV) in the experimental case. We evaluate each model's ability to capture the statistical properties and spatial structures of the turbulent flow. Our results demonstrate that DDPM and DCGAN effectively replicate all flow distributions, highlighting their potential as efficient and accurate tools for turbulence surrogacy. We find a strong argument for DCGAN, as although they are more difficult to train (due to problems such as mode collapse), they show the fastest inference and training time, require less data to train compared to VAE and DDPM, and provide the results most closely aligned with the input stream. In contrast, VAE train quickly (and can generate samples quickly) but do not produce adequate results, and DDPM, whilst effective, are significantly slower at both, inference and training time.

Paper Structure

This paper contains 31 sections, 25 equations, 21 figures, 2 tables.

Figures (21)

  • Figure 1: Comparison of the classical kramer1991nonlinear (top) and the variational kingma2013auto autoencoder (bottom) architectures. In the case of classical AE, the encoder $f_\theta$ receives as input a real-world image $x$ whose most important features are encoded in the latent space $\mathcal{Z}'$. Sending a point of this latent space back to the decoder $g_\phi$ should result in a reconstructed image $x'\approx x$. In contrast, in the case of the VAE, the real-world images have a chosen distribution, making the encoder and decoder probabilistic. The encoder learns the distribution parameters of the input images, and latent samples are drawn during training by the re-parameterization trick, making it possible to compute the gradients for $\mu$ and $\sigma$. Due to the stochasticity of the probabilistic decoder's input, the resulting images are generated rather than reconstructed, as they may differ from the original images.
  • Figure 2: Architecture of a deep convolutional GAN vanilla_gandcgan. While the training data is given by the real-world data $\mathcal{X}$, the fake samples $G(\bm{z})\sim G_*\lambda$ are produced by the generator from a noise vector $\bm{z}$. The discriminator's inputs are GAN-synthesized and real-world samples, and it's task is to estimate the probability in a range of $[0,1]$ that an input sample comes from $\mathcal{X}$ rather than being generated by $G$. During training, the feedback from the discriminator reaches the generator by backpropagation. The entire GAN framework can be backpropagated at once, since $G$ and $D$ are both fully differentiable and trained end-to-end. If the unknown distribution of the real-world data is approximated by $G$ (i.e. $G_*\lambda\approx\mu$) and $D$ is only able to guess the real-world from the fake samples (i.e. $D(\cdot)\approx\frac{1}{2}$), the problem \ref{['eq:vanilla_gan_optimization_problem']} reached its optimum.
  • Figure 3: Denoising diffusion probabilistic (DDPM) model architecture ho2020denoising. At training time, a sample from the dataset $x \sim \Omega$ is noised to a random step $t$ in the noising process, producing a partially noised sample $x_t \sim \mathcal{N}(x_{t-1}\sqrt{1 - \beta_t}, \beta_t I)$. The UNet is trained to produce $x_{t-1}$ from $x_t$ with the parameter $t$ given as a positional embedding. At inference time, a normal sample $x_T = z \sim \mathcal{N}(0, 1)$ is produced and fed to the UNet, which reverses the noising steps one at a time and produces a new sample $x' \sim \Omega'$.
  • Figure 4: Numerical domain (a) and field of view for the investigated test cases numerical and experimental test case, respectively. Flow fields for the cylinder arrays (c).
  • Figure 5: Examples from the dataset of LES simulated flow around a cylinder.
  • ...and 16 more figures