Table of Contents
Fetching ...

Random Features Hopfield Networks generalize retrieval to previously unseen examples

Silvio Kalaj, Clarissa Lauditi, Gabriele Perugini, Carlo Lucibello, Enrico M. Malatesta, Matteo Negri

TL;DR

This work reveals that the network also develops attractors corresponding to previously unseen examples generated with the same set of features, and argues that, increasing the number of stored examples beyond the learning transition, the model also learns to mix the features to represent both stored and previously unseen examples.

Abstract

It has been recently shown that a learning transition happens when a Hopfield Network stores examples generated as superpositions of random features, where new attractors corresponding to such features appear in the model. In this work we reveal that the network also develops attractors corresponding to previously unseen examples generated with the same set of features. We explain this surprising behaviour in terms of spurious states of the learned features: we argue that, increasing the number of stored examples beyond the learning transition, the model also learns to mix the features to represent both stored and previously unseen examples. We support this claim with the computation of the phase diagram of the model.

Random Features Hopfield Networks generalize retrieval to previously unseen examples

TL;DR

This work reveals that the network also develops attractors corresponding to previously unseen examples generated with the same set of features, and argues that, increasing the number of stored examples beyond the learning transition, the model also learns to mix the features to represent both stored and previously unseen examples.

Abstract

It has been recently shown that a learning transition happens when a Hopfield Network stores examples generated as superpositions of random features, where new attractors corresponding to such features appear in the model. In this work we reveal that the network also develops attractors corresponding to previously unseen examples generated with the same set of features. We explain this surprising behaviour in terms of spurious states of the learned features: we argue that, increasing the number of stored examples beyond the learning transition, the model also learns to mix the features to represent both stored and previously unseen examples. We support this claim with the computation of the phase diagram of the model.
Paper Structure (18 sections, 68 equations, 3 figures)

This paper contains 18 sections, 68 equations, 3 figures.

Figures (3)

  • Figure 1: Training and test examples become fixed points after the features have been learned. Magnetization as a function of $\alpha$, for fixed $\alpha_D$. The blue line is the magnetization $\mu$ of hidden features, which grows to $1$ if $\alpha$ is high enough (learning phase). The orange line is the magnetization $m^\mathrm{train}$ of the training examples, which is $m^\mathrm{train}\simeq1$ for low $\alpha$ and drops when $\alpha$ increases, as expected from an associative memory (storage phase). Surprisingly, $m^\mathrm{train}$ grows to $1$ again for high values of $\alpha$. Near this transition, also test examples have $m^\mathrm{test}=1$, as shown by the red line (generalization phase). $N=32000$; averages of $40$ samples. The dashed line shows the analytical prediction for the magnetization of mixtures of $3$ features.
  • Figure 2: Combinations of more features require more training examples. a) Feature magnetization $\mu$ ($dotted$) and test examples magnetization $m^\mathrm{test}$ ($solid$) as a function of $\alpha$ for different $\alpha_D$ (subplots). Different colors represent increasing features per example $L$. Dashed vertical lines are the analytical predictions: in black we show the learning transition; colors correspond to the transitions for mixtures of $L=3,5,7$ features. ($N=32000$; averages of $40$ samples.) b) Scaling with $N$ of the maximum number of features $D_\mathrm{gen}$ for which we observe a generalization transition. Specifically, we plot the maximum $D(N)$ at which $10$ samples have $m^\mathrm{test} > 0.9$, and we average over $4$ to $10$ groups of $10$ samples, depending on $N$.
  • Figure 3: Comparison between the phase diagrams of a standard Hopfield Model from amit1987statistical (top, temperature $T$ vs $\alpha$) and the phase diagram of the dense Random Features Hopfield Model (bottom, $\alpha$ vs $\alpha_D$). In both panels, the blue line is the retrieval line, below which the features can be stored and retrieved. The red, green and yellow lines are the retrieval lines of mixtures, respectively of 3, 5 and 7 examples. When $\alpha\to\infty$, the curves of the Random Features Hopfield Model connect to the critical points at $T=0$ in the Hopfield Model: high order mixtures are stable at lower $\alpha_D$. We also show how these critical points evolve at finite temperature.