Sparse and Structured Hopfield Networks

Saul Santos; Vlad Niculae; Daniel McNamee; Andre F. T. Martins

Sparse and Structured Hopfield Networks

Saul Santos, Vlad Niculae, Daniel McNamee, Andre F. T. Martins

TL;DR

A new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations whose update rules are end-to-end differentiable sparse transformations are provided.

Abstract

Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss margins, sparsity, and exact memory retrieval. We further extend this framework to structured Hopfield networks via the SparseMAP transformation, which can retrieve pattern associations instead of a single pattern. Experiments on multiple instance learning and text rationalization demonstrate the usefulness of our approach.

Sparse and Structured Hopfield Networks

TL;DR

Abstract

Paper Structure (36 sections, 5 theorems, 39 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 36 sections, 5 theorems, 39 equations, 7 figures, 7 tables, 1 algorithm.

Introduction
Notation.
Background
Hopfield Networks
Sparse Transformations and Fenchel-Young Losses
Sparse Hopfield-Fenchel-Young Energies
Definition and update rule
Margins, sparsity, and exact retrieval
Structured Hopfield Networks
Unary scores and structured constraints
General case: factor graph, high order interactions
Structured Fenchel-Young losses and margins
Guarantees for retrieval of pattern associations
Experiments
Hopfield dynamics and basins of attraction
...and 21 more sections

Key Result

proposition 1

Let the query $\bm{q}$ be in the convex hull of the rows of $\bm{X}$, i.e., $\bm{q} = \bm{X}^\top \bm{y}$ for some $\bm{y} \in \triangle_N$. Then, the energy eq:hfy_energy satisfies $0 \le E(\bm{q}) \le \min\left\{2M^2, \,\, -\beta^{-1}\Omega(\mathbf{1}/N) + \frac{1}{2}M^2\right\}$. Furthermore, min

Figures (7)

Figure 1: Overview of the Hopfield networks proposed in this paper: sparse transformations (entmax and normmax) aim to retrieve the closest pattern to the query, and they have exact retrieval guarantees. Structured variants find pattern associations. The $k$-subsets transformation returns a mixture of the top-$k$ patterns, and sequential $k$-subsets favors contiguous retrieval.
Figure 2: Sparse and structured transformations used in this paper and their regularization path. In each plot, we show $\hat{\bm{y}}_\Omega(\beta \bm{\theta}) = \hat{\bm{y}}_{\beta^{-1}\Omega}(\bm{\theta})$ as a function of the temperature $\beta^{-1}$ where $\bm{\theta} = [1.0716, -1.1221, -0.3288, 0.3368, 0.0425]^\top$. Additional examples can be found in App. \ref{['sec:SST_App']}.
Figure 3: Left: contours of the energy function and optimization trajectory of the CCCP iteration ($\beta = 1$). Right: attraction basins associated with each pattern. (White sections do not converge to a single pattern but to a metastable state; $\beta = 10$ (a larger $\beta$ is needed to allow for the $1$-entmax to get $\epsilon$-close to a single pattern); for $\alpha = 1$ we allow a tolerance of $\epsilon = .01$). Additional plots for different $\beta$ can be found in App. \ref{['sec:HDBA']}.
Figure 4: Example of human rationale overlap for the aspect "appearance". The yellow highlight indicates the model's rationale, while italicized and bold font represents the human rationale. Red font identifies mismatches with human annotations. SparseMAP with sequential $k$-subsets prefers more contiguous rationales, which better match humans. Additional examples are shown in App. \ref{['sec:text_rationalization_details']}.
Figure 5: Sparse and structured transformations used in this paper and their regularization path. In each plot, we show $\hat{\bm{y}}_\Omega(\beta \bm{\theta}) = \hat{\bm{y}}_{\beta^{-1}\Omega}(\bm{\theta})$ as a function of the temperature $\beta^{-1}$ where $\bm{\theta} = [1.0716, -1.1221, -0.3288, 0.3368, 0.0425]^\top$.
...and 2 more figures

Theorems & Definitions (9)

definition 1: Margin
proposition 1: Update rule of HFY energies
definition 2: Exact retrieval
proposition 2: Exact retrieval in a single iteration
proposition 3: Storage capacity with exact retrieval
definition 3: Structured margin
proposition 4
proposition 5: Exact structured retrieval
definition 4

Sparse and Structured Hopfield Networks

TL;DR

Abstract

Sparse and Structured Hopfield Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (9)