Table of Contents
Fetching ...

Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval

Saul Santos, Vlad Niculae, Daniel McNamee, André F. T. Martins

TL;DR

The paper presents a unified Hopfield-Fenchel-Young (HFY) framework that recasts associative memory retrieval as energy minimization using a difference of Fenchel-Young losses. By selecting appropriate convex regularizers and post-transformations, HFY encompasses classical Hopfield networks, dense associative memories, modern Hopfield networks, and sparse/structured variants, while enabling exact retrieval via margin-based sparsity and allowing normalization-based post-transformations. The Structured and Sparse HFY extensions leverage SparseMAP and k-subsets to retrieve pattern associations and higher-order relations, with theoretical guarantees on margins and retrieval, plus practical algorithms for memory recall tasks including free and sequential recall. The experimental program demonstrates improved retrieval capacity, metastable state behavior, and competitive performance on image retrieval, MIL, and text rationalization, highlighting the framework’s versatility and potential for integrating memory retrieval with transformer-like attention mechanisms. Overall, HFY offers a principled, convex-analysis-based foundation for designing sparse, structured, and differentiable Hopfield-type memory systems with broad applicability to memory-centric AI tasks.

Abstract

Associative memory models, such as Hopfield networks and their modern variants, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers. In this work, we introduce a unified framework-Hopfield-Fenchel-Young networks-which generalizes these models to a broader family of energy functions. Our energies are formulated as the difference between two Fenchel-Young losses: one, parameterized by a generalized entropy, defines the Hopfield scoring mechanism, while the other applies a post-transformation to the Hopfield output. By utilizing Tsallis and norm entropies, we derive end-to-end differentiable update rules that enable sparse transformations, uncovering new connections between loss margins, sparsity, and exact retrieval of single memory patterns. We further extend this framework to structured Hopfield networks using the SparseMAP transformation, allowing the retrieval of pattern associations rather than a single pattern. Our framework unifies and extends traditional and modern Hopfield networks and provides an energy minimization perspective for widely used post-transformations like $\ell_2$-normalization and layer normalization-all through suitable choices of Fenchel-Young losses and by using convex analysis as a building block. Finally, we validate our Hopfield-Fenchel-Young networks on diverse memory recall tasks, including free and sequential recall. Experiments on simulated data, image retrieval, multiple instance learning, and text rationalization demonstrate the effectiveness of our approach.

Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval

TL;DR

The paper presents a unified Hopfield-Fenchel-Young (HFY) framework that recasts associative memory retrieval as energy minimization using a difference of Fenchel-Young losses. By selecting appropriate convex regularizers and post-transformations, HFY encompasses classical Hopfield networks, dense associative memories, modern Hopfield networks, and sparse/structured variants, while enabling exact retrieval via margin-based sparsity and allowing normalization-based post-transformations. The Structured and Sparse HFY extensions leverage SparseMAP and k-subsets to retrieve pattern associations and higher-order relations, with theoretical guarantees on margins and retrieval, plus practical algorithms for memory recall tasks including free and sequential recall. The experimental program demonstrates improved retrieval capacity, metastable state behavior, and competitive performance on image retrieval, MIL, and text rationalization, highlighting the framework’s versatility and potential for integrating memory retrieval with transformer-like attention mechanisms. Overall, HFY offers a principled, convex-analysis-based foundation for designing sparse, structured, and differentiable Hopfield-type memory systems with broad applicability to memory-centric AI tasks.

Abstract

Associative memory models, such as Hopfield networks and their modern variants, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers. In this work, we introduce a unified framework-Hopfield-Fenchel-Young networks-which generalizes these models to a broader family of energy functions. Our energies are formulated as the difference between two Fenchel-Young losses: one, parameterized by a generalized entropy, defines the Hopfield scoring mechanism, while the other applies a post-transformation to the Hopfield output. By utilizing Tsallis and norm entropies, we derive end-to-end differentiable update rules that enable sparse transformations, uncovering new connections between loss margins, sparsity, and exact retrieval of single memory patterns. We further extend this framework to structured Hopfield networks using the SparseMAP transformation, allowing the retrieval of pattern associations rather than a single pattern. Our framework unifies and extends traditional and modern Hopfield networks and provides an energy minimization perspective for widely used post-transformations like -normalization and layer normalization-all through suitable choices of Fenchel-Young losses and by using convex analysis as a building block. Finally, we validate our Hopfield-Fenchel-Young networks on diverse memory recall tasks, including free and sequential recall. Experiments on simulated data, image retrieval, multiple instance learning, and text rationalization demonstrate the effectiveness of our approach.

Paper Structure

This paper contains 59 sections, 10 theorems, 52 equations, 14 figures, 9 tables, 4 algorithms.

Key Result

Proposition 1

Fenchel-Young losses $L_\Omega(\bm{\theta}, \bm{y})$ satisfy the following properties:

Figures (14)

  • Figure 1: Overview of Hopfield scoring functions: sparse transformations (entmax and normmax) aim to retrieve the closest pattern to the query, and they have exact retrieval guarantees. Structured variants find pattern associations. The $k$-subsets transformation favors a mixture of the top-$k$ patterns, and sequential $k$-subsets favors contiguous retrieval.
  • Figure 2: Sparse and structured transformations used in this paper and their regularization path. In each plot, we show $\hat{\bm{y}}_\Omega(\beta \bm{\theta}) = \hat{\bm{y}}_{\beta^{-1}\Omega}(\bm{\theta})$ as a function of the temperature $\beta^{-1}$ where $\bm{\theta} = [1.0716, -1.1221, -0.3288, 0.3368, 0.0425]^\top$.
  • Figure 3: Examples of generalized entropies (left) are presented alongside their corresponding prediction distributions (middle) and Fenchel-Young losses (right) for the binary case. Here, $\bm{y} = [p, 1 - p] \in \triangle_2$, $\bm{\theta} = [s, 0] \in \mathbb{R}^2$, and $\bm{e}_1$ is the one-hot vector for the first class. Unlike softmax, which never reaches exactly zero and consequently does not have a margin, all other distributions shown in the center can exhibit sparse support.
  • Figure 4: (Top) Memory capacity in terms of unique, non-repeated memories using various free recall methods for different numbers of stored memories with $\beta = 0.1$. (Bottom) Unique memory ratio as a function of $\beta$ for a memory size of 128. Plotted are the medians over 5 runs with different memories and the interquartile range.
  • Figure 5: Simulation of free recall using our two methods on MNIST lecun1998gradient: constrained sparsemax (Algorithm \ref{['alg:csparsemax_free_recall']}) and penalized sparsemax (Algorithm \ref{['alg:penalized_free_recall']}). For both methods, we set the number of Hopfield iterations to $T=5$. In the penalized free recall method, we apply a penalty of $\lambda = 10^{8}$ and a decay rate of $\tau=0.001$. In both case, we set $\beta=0.1$. Red highlight corresponds to repeated memories.
  • ...and 9 more figures

Theorems & Definitions (16)

  • Proposition 1: Properties of Fenchel-Young losses
  • Proposition 2: Update rule of HFY energies
  • Proposition 3: Layer normalization
  • Definition 4
  • Definition 5: Margin
  • Proposition 6: Margin Properties of Tsallis and Norm-Entropies
  • Proposition 7: Update rule of sparse HFY energies
  • Definition 8: Exact retrieval
  • Proposition 9: Exact retrieval in a single iteration
  • Proposition 10: Storage capacity with exact retrieval
  • ...and 6 more