Table of Contents
Fetching ...

Synthetic Lagrangian Turbulence by Generative Diffusion Models

Tianyi Li, Luca Biferale, Fabio Bonaccorso, Martino Andrea Scarpolini, Michele Buzzicotti

TL;DR

This work tackles the challenge of generating faithful 3D Lagrangian trajectories at high Reynolds numbers without relying on exhaustive DNS or experiments. It introduces diffusion-model generators DM-1c (single velocity component) and DM-3c (three correlated components), trained on DNS-based HIT data at $R_\lambda\simeq310$, achieving accurate reproduction of multiscale statistics, including fat-tailed velocity increments and accelerations, structure functions up to order $p=8$, and scale-by-scale exponents ${\zeta(p,\tau)}$ across inertial and dissipative ranges. The models also capture enhanced intermittency near the dissipative time ${\tau_\eta}$ and demonstrate strong generalization to extreme events beyond training data, enabling high-quality synthetic datasets for downstream applications. This diffusion-based framework offers a scalable route to abundant, physically consistent Lagrangian data and provides interpretability opportunities through the progressive multiscale denoising process and potential conditioning on flow configurations.

Abstract

Lagrangian turbulence lies at the core of numerous applied and fundamental problems related to the physics of dispersion and mixing in engineering, bio-fluids, atmosphere, oceans, and astrophysics. Despite exceptional theoretical, numerical, and experimental efforts conducted over the past thirty years, no existing models are capable of faithfully reproducing statistical and topological properties exhibited by particle trajectories in turbulence. We propose a machine learning approach, based on a state-of-the-art diffusion model, to generate single-particle trajectories in three-dimensional turbulence at high Reynolds numbers, thereby bypassing the need for direct numerical simulations or experiments to obtain reliable Lagrangian data. Our model demonstrates the ability to reproduce most statistical benchmarks across time scales, including the fat-tail distribution for velocity increments, the anomalous power law, and the increased intermittency around the dissipative scale. Slight deviations are observed below the dissipative scale, particularly in the acceleration and flatness statistics. Surprisingly, the model exhibits strong generalizability for extreme events, producing events of higher intensity and rarity that still match the realistic statistics. This paves the way for producing synthetic high-quality datasets for pre-training various downstream applications of Lagrangian turbulence.

Synthetic Lagrangian Turbulence by Generative Diffusion Models

TL;DR

This work tackles the challenge of generating faithful 3D Lagrangian trajectories at high Reynolds numbers without relying on exhaustive DNS or experiments. It introduces diffusion-model generators DM-1c (single velocity component) and DM-3c (three correlated components), trained on DNS-based HIT data at , achieving accurate reproduction of multiscale statistics, including fat-tailed velocity increments and accelerations, structure functions up to order , and scale-by-scale exponents across inertial and dissipative ranges. The models also capture enhanced intermittency near the dissipative time and demonstrate strong generalization to extreme events beyond training data, enabling high-quality synthetic datasets for downstream applications. This diffusion-based framework offers a scalable route to abundant, physically consistent Lagrangian data and provides interpretability opportunities through the progressive multiscale denoising process and potential conditioning on flow configurations.

Abstract

Lagrangian turbulence lies at the core of numerous applied and fundamental problems related to the physics of dispersion and mixing in engineering, bio-fluids, atmosphere, oceans, and astrophysics. Despite exceptional theoretical, numerical, and experimental efforts conducted over the past thirty years, no existing models are capable of faithfully reproducing statistical and topological properties exhibited by particle trajectories in turbulence. We propose a machine learning approach, based on a state-of-the-art diffusion model, to generate single-particle trajectories in three-dimensional turbulence at high Reynolds numbers, thereby bypassing the need for direct numerical simulations or experiments to obtain reliable Lagrangian data. Our model demonstrates the ability to reproduce most statistical benchmarks across time scales, including the fat-tail distribution for velocity increments, the anomalous power law, and the increased intermittency around the dissipative scale. Slight deviations are observed below the dissipative scale, particularly in the acceleration and flatness statistics. Surprisingly, the model exhibits strong generalizability for extreme events, producing events of higher intensity and rarity that still match the realistic statistics. This paves the way for producing synthetic high-quality datasets for pre-training various downstream applications of Lagrangian turbulence.
Paper Structure (10 sections, 28 equations, 6 figures, 2 tables)

This paper contains 10 sections, 28 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Comparison between direct numerical simulations (DNS) and diffusion models (DMs).a, Standardized probability density functions (PDFs) of one generic component of the velocity increment, $\delta_\tau V_i$, at $\tau/\tau_\eta=1,2,5,100$ for ground-truth DNS data (black lines), synthetically generated data from DM-1c (blue lines with circles) and that from DM-1c-10% (green lines with squares), a DM-1c model trained with 10% DNS data. PDFs for different $\tau$ are vertically shifted for the sake of presentation. b,c,d, DM-1c trajectories for one generic velocity component with large, medium, and small time increments, $\tau/\tau_\eta=100,5,1$, respectively. e, Comparison of 3D trajectories showing small-scale vortex structures, for both DNS and DM-3c data, where different curves correspond to the three standardized velocity components, $i=x,y,z$. For the DNS, the high oscillatory correlations between the three components are consistent with the presence of strong vortical structures. Similarly, in the case of DM-3c, these correlations can be interpreted as reflecting vortical structures within the hypothetical Eulerian flow. f, Examples of 3D trajectories reconstructed from DNS (bottom) and DM-3c (top). Notice in panel a the remarkable generalizability properties of our DM data-driven model, able to explore and capture extreme events for velocity fluctuations with far larger intensities than observed in the DNS dataset, represented by much more extended tails, while still maintaining the ground truth statistics inherent in the training data. Here, the statistics for DM-1c and DM-1c-10% data are derived from 86 and 22 times the number of trajectories in the DNS, respectively.
  • Figure 2: Statistics of acceleration. Standardized PDFs of one generic component of the acceleration, $a_i$, for ground-truth DNS data (black line), synthetically generated data from DM-1c (blue line with circles) and that from DM-1c-10% (green line with squares). Notice the ability of DM-1c to well generalize the statistical trend for rare intense fluctuations never experienced during the training phase with the DNS data. The statistics of the DM-1c and DM-1c-10% data are based on 86 and 22 times the number of trajectories in the DNS, respectively. Inset: acceleration correlation function.
  • Figure 3: Illustration of the DM model and in-depth examination of its backward generation process.a, Schematic representation of the DM model and associated UNet sketch, complemented by a table of hyperparameters. Here, $N$ denotes the total number of diffusion steps and $n$ denotes the intermediate step. More details on the network architecture can be found in the Methods section and in dhariwal2021diffusion. b, Three distinct noise schedules for the DM's forward and backward processes explored in this study (see Methods). Points A-D indicate four different stages during the backward generation process (from ${\cal V}_N$ to ${\cal V}_0$) along the optimal noise schedule, curve (tanh6-1). At an early step during the backward process, we have very noisy signals, $n=0.52N$ (D), followed by two intermediate steps at $n=0.27N$ (C) and $n=0.06N$ (B), and the final synthetic trajectory obtained for $n=0$ (A). Please see panel f for the corresponding illustration of one trajectory generation from D to A. A few statistical properties of the DM-1c signals generated at the four backward steps A-D: c, PDF of $\delta_\tau V_i$ for $\tau=\tau_\eta$; d, Second-order structure function, $S^{(2)}_\tau$; e, Fourth-order flatness, $F^{(4)}_\tau$.
  • Figure 4: Multiscale statistical properties of velocity increments.a, Log-log plot of Lagrangian structure functions, $S^{(p)}_\tau$, for $p=2,4$ and $6$, compared across DNS, DM-1c, and DM-3c. b, Log-log plot of the generalized flatness, $F^{(p)}_\tau$, for $p=4,6$ and $8$, compared across DNS, DM-1c, and DM-3c. c, Log-log plot of $4$th-order mixed flatness, $F^{(4,ij)}_\tau$, averaged over combinations of $ij=xy,xz$ and $yz$ for both DNS and DM-3c. Error bars are computed as min-max range over the fluctuations of 10 different independent batches sub-sampled from $N_p$ trajectories for each velocity component. Error bars may appear smaller than the data points.
  • Figure 5: Scale-by-scale intermittent properties.a, Comparison between the ground-truth DNS and the two DMs, on the lin-log scale, for the 4th-order logarithmic local slope $\zeta(4,\tau)$ defined in (\ref{['eq:chi']}). b, The same quantity shown in a from a state-of-the-art collection of DNS mordant2004experimentalhomann2007lagrangianbiferale2005particlefisher2008terascaleyeung2006reynolds and experimental data berg2006backwardsxu2006highmordant2001measurement (redrawn from Fig.1 of arneodo2008universal). The dotted horizontal lines represent the non-intermittent dimensional scaling, $S^{(4)}_\tau \propto [S^{(2)}_\tau]^2$. Statistics and error bars in a are derived as in Fig. \ref{['figure:Figure4']}. This resulted in 30 batches for DNS and DM-3c, and 10 batches for DM-1c. The error bars in panel b are computed solely over the three different velocity components.
  • ...and 1 more figures