Table of Contents
Fetching ...

The Exponential Capacity of Dense Associative Memories

Carlo Lucibello, Marc Mézard

TL;DR

The paper investigates Dense Associative Memories (DAMs) with exponential storage, $P=e^{\alpha N}$, and derives exact retrieval criteria using a Random Energy Model (REM) framework. It defines the typical-pattern retrieval threshold $\alpha_1(\lambda)$ and a lower bound for all-pattern retrieval $\alpha_c(\lambda)$, showing that for spherical patterns these thresholds coincide, while Gaussian patterns exhibit a gap and a condensation transition at $\lambda_*(\alpha,\rho)$. By analyzing basins of attraction, it characterizes how random initial conditions converge to retrieved patterns and highlights geometric distinctions between pattern ensembles. A scaled dot-product regime relevant to Transformer attention reveals a regime where the single-pattern and all-patterns thresholds merge, emphasizing the connection between DAMs and attention mechanisms. The results illuminate how exponential memory capacity emerges in high-dimensional settings and point to future directions for rigorous proofs and extensions to finite temperature and other pattern distributions.

Abstract

Recent generalizations of the Hopfield model of associative memories are able to store a number $P$ of random patterns that grows exponentially with the number $N$ of neurons, $P=\exp(αN)$. Besides the huge storage capacity, another interesting feature of these networks is their connection to the attention mechanism which is part of the Transformer architectures widely applied in deep learning. In this work, we study a generic family of pattern ensembles using a statistical mechanics analysis which gives exact asymptotic thresholds for the retrieval of a typical pattern, $α_1$, and lower bounds for the maximum of the load $α$ for which all patterns can be retrieved, $α_c$, as well as sizes of attraction basins. We discuss in detail the cases of Gaussian and spherical patterns, and show that they display rich and qualitatively different phase diagrams.

The Exponential Capacity of Dense Associative Memories

TL;DR

The paper investigates Dense Associative Memories (DAMs) with exponential storage, , and derives exact retrieval criteria using a Random Energy Model (REM) framework. It defines the typical-pattern retrieval threshold and a lower bound for all-pattern retrieval , showing that for spherical patterns these thresholds coincide, while Gaussian patterns exhibit a gap and a condensation transition at . By analyzing basins of attraction, it characterizes how random initial conditions converge to retrieved patterns and highlights geometric distinctions between pattern ensembles. A scaled dot-product regime relevant to Transformer attention reveals a regime where the single-pattern and all-patterns thresholds merge, emphasizing the connection between DAMs and attention mechanisms. The results illuminate how exponential memory capacity emerges in high-dimensional settings and point to future directions for rigorous proofs and extensions to finite temperature and other pattern distributions.

Abstract

Recent generalizations of the Hopfield model of associative memories are able to store a number of random patterns that grows exponentially with the number of neurons, . Besides the huge storage capacity, another interesting feature of these networks is their connection to the attention mechanism which is part of the Transformer architectures widely applied in deep learning. In this work, we study a generic family of pattern ensembles using a statistical mechanics analysis which gives exact asymptotic thresholds for the retrieval of a typical pattern, , and lower bounds for the maximum of the load for which all patterns can be retrieved, , as well as sizes of attraction basins. We discuss in detail the cases of Gaussian and spherical patterns, and show that they display rich and qualitatively different phase diagrams.
Paper Structure (27 sections, 79 equations, 10 figures)

This paper contains 27 sections, 79 equations, 10 figures.

Figures (10)

  • Figure 1: Top: Phase diagram for patterns uniformly distributed on the hypersphere. All the patterns are retrieved in the green region $\alpha<\alpha_c=\mathop{\mathrm{\alpha_{c}^{lb}}}\nolimits=\alpha_1$ (the two thresholds coincide in the spherical setting). The dashed line $\alpha_*(\lambda)$ is where the REM condensation occurs, and the dotted one is the lower bound for the capacity derived in Ref. ramsauer2021hopfield. Bottom: Gradient descent simulations starting from initial condition $\mathop{\mathrm{\boldsymbol{\xi}}}\nolimits^1$. Points give values of $\lambda$ where we observe a crossover between retrieval (higher $\lambda$) and non-retrieval (lower $\lambda$). Solid lines are empirical quadratic fits in $1/N$. Horizontal lines are the predictions from our $N=+\infty$ theory.
  • Figure 2: Characterization of the basins of attraction given by the maximum angle $\theta_c(\alpha,\lambda)$ between a typical pattern and a random initialization such that the pattern is retrieved with high probability. The patterns here follow a spherical distribution and the configuration is also initialized on the hypersphere. Horizontal lines correspond to the angle of the nearest pattern.
  • Figure 3: Phase diagram for Gaussian patterns, showing the critical capacity for retrieval of a typical pattern, $\alpha_1$, as well as the lower bound for the capacity of retrieval of all patterns, $\mathop{\mathrm{\alpha_{c}^{lb}}}\nolimits$. The $\alpha_*$ line is the condensation transition in the auxiliary REM. In the non-retrieval regime $\alpha>\alpha_1$ there exist two phases, one with $\alpha>\alpha_*$ where the energy is dominated by an exponential number of patterns, and the condensed phase $\alpha_1<\alpha<\alpha_*$.
  • Figure 4: The REM rate function $s_\rho({\varepsilon})$ for $\rho=1$ and the two pattern ensembles considered in this paper.
  • Figure 5: Size of attraction basins for patterns uniformly distributed on the hypersphere. We study here the case in which the number of stored patterns is polynomial, $P=\tilde{\alpha} N^k$, far below its exponential storage capacity. Given a pattern $\mathop{\mathrm{\boldsymbol{\xi}}}\nolimits^1$, we compute the cosine of the average angle to the nearest other pattern. We plot the expectation value of this cosine versus the theoretical prediction $\sqrt{2\frac{k\log N +\log \tilde{\alpha}}{N}}$. In the large $N$ limit, there is good agreement between the numerical result and the prediction.
  • ...and 5 more figures