Table of Contents
Fetching ...

Associative Memory and Generative Diffusion in the Zero-noise Limit

Joshua Hess, Quaid Morris

TL;DR

This work provides a global geometric framework linking memory and generation by showing that Morse-Smale gradient systems universally approximate energy-based associative memories and that diffusion processes converge to these memory landscapes in the zero-noise limit. It proves that under Morse-Smale and related generic conditions, trajectories and invariant measures are structurally stable, with memory landscapes organized as invariant DAGs reflecting stable connections among memories. The paper extends these ideas to diffusion models, establishing that their zero-noise limits concentrate on stable memories and that generation dynamics can be described by monotone, bifurcation-driven changes in gradient fields. By unifying energy-based models, Hopfield-type networks, and denoising diffusion models under a single geometric paradigm, it offers a robust lens to study learning, memory consolidation, and the memory-generation transition across a broad class of neural architectures.

Abstract

This paper shows that generative diffusion processes converge to associative memory systems at vanishing noise levels and characterizes the stability, robustness, memorization, and generation dynamics of both model classes. Morse-Smale dynamical systems are shown to be universal approximators of associative memory models, with diffusion processes as their white-noise perturbations. The universal properties of associative memory that follow are used to characterize a generic transition from generation to memory as noise diminishes. Structural stability of Morse-Smale flows -- that is, the robustness of their global critical point structure -- implies the stability of both trajectories and invariant measures for diffusions in the zero-noise limit. The learning and generation landscapes of these models appear as parameterized families of gradient flows and their stochastic perturbations, and the bifurcation theory for Morse-Smale systems implies that they are generically stable except at isolated parameter values, where enumerable sets of local and global bifurcations govern transitions between stable systems in parameter space. These landscapes are thus characterized by ordered bifurcation sequences that create, destroy, or alter connections between rest points and are robust under small stochastic or deterministic perturbations. The framework is agnostic to model formulation, which we verify with examples from energy-based models, denoising diffusion models, and classical and modern Hopfield networks. We additionally derive structural stability criteria for Hopfield-type networks and find that simple cases violate them. Collectively, our geometric approach provides insight into the classification, stability, and emergence of memory and generative landscapes.

Associative Memory and Generative Diffusion in the Zero-noise Limit

TL;DR

This work provides a global geometric framework linking memory and generation by showing that Morse-Smale gradient systems universally approximate energy-based associative memories and that diffusion processes converge to these memory landscapes in the zero-noise limit. It proves that under Morse-Smale and related generic conditions, trajectories and invariant measures are structurally stable, with memory landscapes organized as invariant DAGs reflecting stable connections among memories. The paper extends these ideas to diffusion models, establishing that their zero-noise limits concentrate on stable memories and that generation dynamics can be described by monotone, bifurcation-driven changes in gradient fields. By unifying energy-based models, Hopfield-type networks, and denoising diffusion models under a single geometric paradigm, it offers a robust lens to study learning, memory consolidation, and the memory-generation transition across a broad class of neural architectures.

Abstract

This paper shows that generative diffusion processes converge to associative memory systems at vanishing noise levels and characterizes the stability, robustness, memorization, and generation dynamics of both model classes. Morse-Smale dynamical systems are shown to be universal approximators of associative memory models, with diffusion processes as their white-noise perturbations. The universal properties of associative memory that follow are used to characterize a generic transition from generation to memory as noise diminishes. Structural stability of Morse-Smale flows -- that is, the robustness of their global critical point structure -- implies the stability of both trajectories and invariant measures for diffusions in the zero-noise limit. The learning and generation landscapes of these models appear as parameterized families of gradient flows and their stochastic perturbations, and the bifurcation theory for Morse-Smale systems implies that they are generically stable except at isolated parameter values, where enumerable sets of local and global bifurcations govern transitions between stable systems in parameter space. These landscapes are thus characterized by ordered bifurcation sequences that create, destroy, or alter connections between rest points and are robust under small stochastic or deterministic perturbations. The framework is agnostic to model formulation, which we verify with examples from energy-based models, denoising diffusion models, and classical and modern Hopfield networks. We additionally derive structural stability criteria for Hopfield-type networks and find that simple cases violate them. Collectively, our geometric approach provides insight into the classification, stability, and emergence of memory and generative landscapes.

Paper Structure

This paper contains 72 sections, 17 theorems, 95 equations, 11 figures, 1 algorithm.

Key Result

Theorem 8

Let $X = -\nabla_g V$ be a gradient field on a Riemannian $n$-manifold $(M,g)$ with $V \in C^{r+1}(M,\mathbb{R}), r \geq 1$ a Morse function. Let $p \in M$ be a hyperbolic fixed point of $V$ with index $\lambda_p$, and $E^s$ (resp. $E^u$) the stable (resp. unstable) subspace of $DX_p = L$. Then the

Figures (11)

  • Figure 1: Small random perturbations and zero-noise limits.(a) Symbols representing phase portraits and probability measures throughout. (b) Trajectories (grey) of diffusion processes $\mathcal{X}^{\epsilon}$ at varying noise levels (left to right) overlayed on the energy surface (light blue interior, magenta exterior) of the associative memory model in \ref{['example:bistable-memories']}. As $\epsilon \rightarrow 0$ trajectories approach those of the deterministic system -- $\{\mathcal{X}^{\epsilon}\}$ are small random perturbations of the gradient flow. Attractors are memories (black circles). Recall from partial/corrupted patterns (white circles) is given by the asymptotic behavior of trajectories (black lines). (c) Invariant measures $\mu^{\epsilon}(dv)$ of $\mathcal{X}^{\epsilon}$ are Boltzmann-Gibbs distributions. As $\epsilon \rightarrow 0$, the sequence $\{ \mu^{\epsilon}\}$ converges to the zero-noise limit of $\{\mathcal{X}^{\epsilon}\}$.
  • Figure 2: Generic phase space decomposition and hierarchical organization (DAG) of associative memory.(a) Phase portraits of Morse-Smale gradients overlayed on their energy surfaces. The two attractor system (top) is the dual-well model of \ref{['example:bistable-memories']}. A dual cusp geometry describes the three attractor system (bottom) and corresponds to the potential $V= \frac{1}{10} \left( v_1^4 + v_2^4 - 3v_2^3 + 7v_2v_1^2 + \frac{1}{10}v_2^2 - 2v_2 \right)$. Saddles (white crosses) and attractors (black circles) are labeled alphabetically. (b) Decomposition of the phase space into disjoint stable manifolds of each critical element (red, green, and blue shades). (c) Invariant DAGs of the dual-well model (top) and dual cusp model (bottom). Nodes correspond to critical elements and are orded by their index. Top layer nodes correspond to index 1 saddle points with edges to index 0 attractors (memories).
  • Figure 3: Saddle-node (fold) bifurcation.(a) Trajectories (grey) representing solutions to the one-parameter family of gradients from \ref{['example:saddle-node']} (left to right) overlayed on their respective energy surfaces. As the parameter value $\eta \in [-1,1]$ changes from $\eta=-1$ to $\eta=0$ (left to middle), an attractor and saddle are born indicating a supercritical fold bifurcation. From $\eta=0$ to $\eta=1$, the saddle and opposite attractor are destroyed, corresponding to the subcritical case. (b) Invariant measures $\mu^{\epsilon}(dv)$ of small random perturbations $\mathcal{X}^{\epsilon}$ of each gradient flow from $\eta=-1$ to $\eta=1$ with $\epsilon=1$. As $\epsilon \rightarrow 0$, the invariant measure will concentrate on the attractors.
  • Figure 4: Heteroclinic flip bifurcation.(a) Trajectories (grey) of the one-parameter family of gradients in \ref{['example:heteroclinic-flip']} (left to right). As the parameter value $\eta$ changes from $\eta=-1$ to $\eta=0$ (left to middle), the unstable manifold of a saddle (right-hand side) shifts from intersecting the stable manifold of the top attractor to intersecting the stable manifold of the other saddle, creating a saddle-saddle connection (red line). From $\eta=0$ to $\eta=1$, the saddle connection is destroyed and the unstable manifold of the right-hand side saddle intersects the stable manifold of the opposite attractor (bottom). The arc of gradient fields from $\eta=-1$ to $\eta=1$ encounters a heteroclinic flip bifurcation. (b) Invariant measures $\mu^{\epsilon}(dv)$ of small random perturbations $\mathcal{X}^{\epsilon}$ with $\epsilon=1$.
  • Figure 5: Codimension one bifurcations during the learning dynamics of energy-based generative models.(a) Trajectories (grey) and critical points of a one-parameter family of gradients (left to right) derived from the potential of an energy-based model trained to generate four centroids in $\mathbb{R}^2$ using contrastive divergence. The model was pretrained to generate a centroid at the origin. The energy was parameterized by a three-layer multilayer perceptron with the softplus activation and a hidden dimensionality of 128, and was regularized by adding a quadratic term $\frac{1}{2}(\max_x ||x||)^2$ over the training data to encourage a barrier on the max norm of generated data. As the optimization index increases from $\eta_0 = 0$ to $\eta_{\text{final}} = 2999$, a sequence bifurcations occur; shown are representative bifurcations during training. Additional bifurcations occur and are not shown. DAGs at parameter values are shown as insets. A supercritical saddle-node occurs from $\eta=0$ to $\eta=301$, creating two attractors. Another saddle-node creates a third attractor and index 1 saddle from $\eta=301$ to $\eta=302$. A heteroclinic flip appears to occur between $\eta=1496$ to $\eta=1497$, causing the unstable manifold of the top saddle to change from intersecting the stable manifold of the bottom left attractor to the bottom right attractor. A final supercritical saddle-node creates the fourth attractor.
  • ...and 6 more figures

Theorems & Definitions (38)

  • Definition 1: Stationary and equilibrium distributions
  • Definition 2: Small random perturbation, adapted from cowieson2005srb
  • Definition 3: Zero-noise limit, cowieson2005srb
  • Definition 4: Structural stability
  • Example 1: Bistable associative memory, \ref{['fig:figure-intuition-example-zero-noise']}
  • Definition 5: Non-wandering set
  • Definition 6: Morse-Smale gradient
  • Definition 7: Compact-open $C^r$ topology
  • Theorem 8: Stable Manifold Theorem for Morse functions
  • Proposition 9: Generic zero-noise limits.
  • ...and 28 more