Table of Contents
Fetching ...

Non-equilibrium active noise enhances generative memory in diffusion models

Agnish Kumar Behera, Alexandra Lamtyugina, Aditya Nandy, Daiki Goto, Carlos Floyd, Suriyanarayanan Vaikuntanathan

TL;DR

The paper shows that driving diffusion models with non-equilibrium, active noise fundamentally changes information flow, enabling memory of high-level concepts to be stored in temporal correlations of auxiliary variables. By formulating an active forward process and a corresponding reverse diffusion with scores on the active degrees of freedom, the authors demonstrate slower information decay via Fisher memory curves and earlier, robust speciation in the reverse process. Across toy models, alanine dipeptide, and large-scale datasets like MNIST and CIFAR-10, active diffusion yields sharper, more faithful multi-scale structures and improved fidelity (lower FID) without extra training tricks. These results suggest a thermodynamically distinct and practically advantageous route—active generative AI—for exploring rugged energy landscapes and retaining semantic information during sampling, with potential to simplify learning through physics-informed dynamics.

Abstract

Generative diffusion models have emerged as powerful tools for sampling high-dimensional distributions, yet they typically rely on white gaussian noise and noise schedules to destroy and reconstruct information. Here, we demonstrate that driving the generative process out of equilibrium using active, temporally correlated noise sources fundamentally alters the information thermodynamics of the system. We show that coupling the data to an active non-Markovian bath creates a `memory effect' where high-level semantic information (such as class identity or molecular metastability) is stored in the temporal correlations of auxiliary degrees of freedom. Using Fisher information analysis, we prove that this active mechanism significantly retards the rate of information decay compared to passive Brownian motion. Crucially, this memory effect facilitates an earlier and more robust symmetry breaking (speciation) during the reverse generative process, allowing the system to resolve multi-scale structures, reminiscent of metastable states in molecular configurations that are washed out in the typical noising processes. Our results suggest that non-equilibrium protocols, inspired by active matter physics, offer a thermodynamically distinct and potentially advantageous pathway for recovering high-dimensional energy landscapes using generative diffusion.

Non-equilibrium active noise enhances generative memory in diffusion models

TL;DR

The paper shows that driving diffusion models with non-equilibrium, active noise fundamentally changes information flow, enabling memory of high-level concepts to be stored in temporal correlations of auxiliary variables. By formulating an active forward process and a corresponding reverse diffusion with scores on the active degrees of freedom, the authors demonstrate slower information decay via Fisher memory curves and earlier, robust speciation in the reverse process. Across toy models, alanine dipeptide, and large-scale datasets like MNIST and CIFAR-10, active diffusion yields sharper, more faithful multi-scale structures and improved fidelity (lower FID) without extra training tricks. These results suggest a thermodynamically distinct and practically advantageous route—active generative AI—for exploring rugged energy landscapes and retaining semantic information during sampling, with potential to simplify learning through physics-informed dynamics.

Abstract

Generative diffusion models have emerged as powerful tools for sampling high-dimensional distributions, yet they typically rely on white gaussian noise and noise schedules to destroy and reconstruct information. Here, we demonstrate that driving the generative process out of equilibrium using active, temporally correlated noise sources fundamentally alters the information thermodynamics of the system. We show that coupling the data to an active non-Markovian bath creates a `memory effect' where high-level semantic information (such as class identity or molecular metastability) is stored in the temporal correlations of auxiliary degrees of freedom. Using Fisher information analysis, we prove that this active mechanism significantly retards the rate of information decay compared to passive Brownian motion. Crucially, this memory effect facilitates an earlier and more robust symmetry breaking (speciation) during the reverse generative process, allowing the system to resolve multi-scale structures, reminiscent of metastable states in molecular configurations that are washed out in the typical noising processes. Our results suggest that non-equilibrium protocols, inspired by active matter physics, offer a thermodynamically distinct and potentially advantageous pathway for recovering high-dimensional energy landscapes using generative diffusion.

Paper Structure

This paper contains 30 sections, 43 equations, 15 figures.

Figures (15)

  • Figure 1: Schematic for the forward and backward diffusion processes. (a) Active diffusion correlates a noise variable $\eta$ with the data degrees of freedom ${\bf x}$ during generative diffusion processes. (b), (c), (d) and (e) denote passive and active reverse diffusion for two one dimensional distributions: (b,d) is a coarser distribution having three almost non-overlapping distinct peaks with all three of them having an almost equal weight whereas (c,e) his a finer distribution having five closely overlapping peaks with 4 peaks having a much smaller weight than the one large peak. In the case of distributions in (c, e) the active process better resolves finer scale features.
  • Figure 2: Generation of samples in a Gaussian mixture distribution via reverse diffusion with (a) analytical scores and (b) scores learned with a neural network from training data. Each scatter plot contains 10,000 2D samples, colored by sample density for visual clarity (higher densities are indicated by green points and lower densities by purple). (a) Active diffusion outperforms passive case for larger $dt$, the time step size of the reverse diffusion trajectory. As $dt$ decreases, the performance of passive diffusion becomes comparable to that of active diffusion. For very small $dt$, both passive and active diffusion accurately reproduce the target distribution. (b) When using neural networks to approximate score function in the reverse diffusion process, passive and active diffusion show similar trends as those seen in diffusion with the analytic score function. Both improve as $dt$ is decreased, but the overall performance is worse than in the analytic example. The step size dt determines the number of sampling steps in the reverse-diffusion process. Smaller dt imply a larger number of timesteps which incur a huge overhead cost in terms of calling the neural network function.
  • Figure 3: (a) Generation of samples for a 2D distribution of multiple overlapping Swiss rolls via reverse diffusion with the score function approximated by a neural network. (b) Ramachandran plots ($\phi$, $\psi$) in degrees for 1 $\mu$s of molecular dynamics sampling for a water-solvated alanine dipeptide (left) and corresponding diffusion generated samples with passive (center), and active ($\tau=0.5$) (right). The score model was trained using a MLP on a 2D dataset consisting of dihedral angle pairs $(\phi, \psi)$. (c) The first two principal components of alanine dipeptide configuration parameters. The score model was trained using a U-net on a 25D dataset consisting of bond lengths, bond angles, and dihedral angles.
  • Figure 4: Fréchet Inception Distance (FID) scores as a function of training epoch number for passive (blue) and active (red) diffusion models with $\tau = 0.5$. Representative generated digits are shown for models trained at every 20 epochs.
  • Figure 5: Comparing generated images with active and passive versions of generative diffusion with the CIFAR-10 dataset. Parameters used $k=4$, $T_a=6.4$, $\tau=0.15$ for active and similar $k,T_p$ parameters for passive . A total of $\sim \times 10^5$ steps were used in both cases. The active version performs better than its passive counterpart in line with previous findings.
  • ...and 10 more figures