Table of Contents
Fetching ...

Sparse and Structured Hopfield Networks

Saul Santos, Vlad Niculae, Daniel McNamee, Andre F. T. Martins

TL;DR

A new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations whose update rules are end-to-end differentiable sparse transformations are provided.

Abstract

Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss margins, sparsity, and exact memory retrieval. We further extend this framework to structured Hopfield networks via the SparseMAP transformation, which can retrieve pattern associations instead of a single pattern. Experiments on multiple instance learning and text rationalization demonstrate the usefulness of our approach.

Sparse and Structured Hopfield Networks

TL;DR

A new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations whose update rules are end-to-end differentiable sparse transformations are provided.

Abstract

Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss margins, sparsity, and exact memory retrieval. We further extend this framework to structured Hopfield networks via the SparseMAP transformation, which can retrieve pattern associations instead of a single pattern. Experiments on multiple instance learning and text rationalization demonstrate the usefulness of our approach.
Paper Structure (36 sections, 5 theorems, 39 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 36 sections, 5 theorems, 39 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

proposition 1

Let the query $\bm{q}$ be in the convex hull of the rows of $\bm{X}$, i.e., $\bm{q} = \bm{X}^\top \bm{y}$ for some $\bm{y} \in \triangle_N$. Then, the energy eq:hfy_energy satisfies $0 \le E(\bm{q}) \le \min\left\{2M^2, \,\, -\beta^{-1}\Omega(\mathbf{1}/N) + \frac{1}{2}M^2\right\}$. Furthermore, min

Figures (7)

  • Figure 1: Overview of the Hopfield networks proposed in this paper: sparse transformations (entmax and normmax) aim to retrieve the closest pattern to the query, and they have exact retrieval guarantees. Structured variants find pattern associations. The $k$-subsets transformation returns a mixture of the top-$k$ patterns, and sequential $k$-subsets favors contiguous retrieval.
  • Figure 2: Sparse and structured transformations used in this paper and their regularization path. In each plot, we show $\hat{\bm{y}}_\Omega(\beta \bm{\theta}) = \hat{\bm{y}}_{\beta^{-1}\Omega}(\bm{\theta})$ as a function of the temperature $\beta^{-1}$ where $\bm{\theta} = [1.0716, -1.1221, -0.3288, 0.3368, 0.0425]^\top$. Additional examples can be found in App. \ref{['sec:SST_App']}.
  • Figure 3: Left: contours of the energy function and optimization trajectory of the CCCP iteration ($\beta = 1$). Right: attraction basins associated with each pattern. (White sections do not converge to a single pattern but to a metastable state; $\beta = 10$ (a larger $\beta$ is needed to allow for the $1$-entmax to get $\epsilon$-close to a single pattern); for $\alpha = 1$ we allow a tolerance of $\epsilon = .01$). Additional plots for different $\beta$ can be found in App. \ref{['sec:HDBA']}.
  • Figure 4: Example of human rationale overlap for the aspect "appearance". The yellow highlight indicates the model's rationale, while italicized and bold font represents the human rationale. Red font identifies mismatches with human annotations. SparseMAP with sequential $k$-subsets prefers more contiguous rationales, which better match humans. Additional examples are shown in App. \ref{['sec:text_rationalization_details']}.
  • Figure 5: Sparse and structured transformations used in this paper and their regularization path. In each plot, we show $\hat{\bm{y}}_\Omega(\beta \bm{\theta}) = \hat{\bm{y}}_{\beta^{-1}\Omega}(\bm{\theta})$ as a function of the temperature $\beta^{-1}$ where $\bm{\theta} = [1.0716, -1.1221, -0.3288, 0.3368, 0.0425]^\top$.
  • ...and 2 more figures

Theorems & Definitions (9)

  • definition 1: Margin
  • proposition 1: Update rule of HFY energies
  • definition 2: Exact retrieval
  • proposition 2: Exact retrieval in a single iteration
  • proposition 3: Storage capacity with exact retrieval
  • definition 3: Structured margin
  • proposition 4
  • proposition 5: Exact structured retrieval
  • definition 4