Table of Contents
Fetching ...

Hopfield model with planted patterns: a teacher-student self-supervised learning model

Francesco Alemanno, Luca Camanzi, Gianluca Manzan, Daniele Tantari

TL;DR

This work extends the Hopfield model by embedding planted, correlated patterns within a teacher-student self-supervised learning framework, where the student weights act as patterns and the dataset encodes the planted signal. It shows a transition between memorization and generalization driven by the training set size $M$, dataset noise $\beta$, and the inference temperature $\beta^{-1}$, with a key analytic result on the Nishimori line giving a phase boundary $\beta_c^{-1}=1+\sqrt{\gamma}$ (extensive data $M=\gamma N$). Replica-symmetric analysis reveals nonzero learning order parameters $m$ and $q$ above the critical line, with $m=q$ on the Nishimori line, indicating learning by generalization without spin-glass, while memorization dominates at low $M$ or high noise. The results connect classical memory-capacity phenomena to self-supervised learning, showing how dataset structure and size can enable generalization even when individual examples are weakly informative.

Abstract

While Hopfield networks are known as paradigmatic models for memory storage and retrieval, modern artificial intelligence systems mainly stand on the machine learning paradigm. We show that it is possible to formulate a teacher-student self-supervised learning problem with Boltzmann machines in terms of a suitable generalization of the Hopfield model with structured patterns, where the spin variables are the machine weights and patterns correspond to the training set's examples. We analyze the learning performance by studying the phase diagram in terms of the training set size, the dataset noise and the inference temperature (i.e. the weight regularization). With a small but informative dataset the machine can learn by memorization. With a noisy dataset, an extensive number of examples above a critical threshold is needed. In this regime the memory storage limits of the system becomes an opportunity for the occurrence of a learning regime in which the system can generalize.

Hopfield model with planted patterns: a teacher-student self-supervised learning model

TL;DR

This work extends the Hopfield model by embedding planted, correlated patterns within a teacher-student self-supervised learning framework, where the student weights act as patterns and the dataset encodes the planted signal. It shows a transition between memorization and generalization driven by the training set size , dataset noise , and the inference temperature , with a key analytic result on the Nishimori line giving a phase boundary (extensive data ). Replica-symmetric analysis reveals nonzero learning order parameters and above the critical line, with on the Nishimori line, indicating learning by generalization without spin-glass, while memorization dominates at low or high noise. The results connect classical memory-capacity phenomena to self-supervised learning, showing how dataset structure and size can enable generalization even when individual examples are weakly informative.

Abstract

While Hopfield networks are known as paradigmatic models for memory storage and retrieval, modern artificial intelligence systems mainly stand on the machine learning paradigm. We show that it is possible to formulate a teacher-student self-supervised learning problem with Boltzmann machines in terms of a suitable generalization of the Hopfield model with structured patterns, where the spin variables are the machine weights and patterns correspond to the training set's examples. We analyze the learning performance by studying the phase diagram in terms of the training set size, the dataset noise and the inference temperature (i.e. the weight regularization). With a small but informative dataset the machine can learn by memorization. With a noisy dataset, an extensive number of examples above a critical threshold is needed. In this regime the memory storage limits of the system becomes an opportunity for the occurrence of a learning regime in which the system can generalize.
Paper Structure (7 sections, 13 theorems, 143 equations, 5 figures)

This paper contains 7 sections, 13 theorems, 143 equations, 5 figures.

Key Result

Theorem 1

If $\beta\leq 1$, or $\beta>1$ and $\lambda,h\in\mathbb{R}\setminus\{0\}$, it holds for any $\bm{\epsilon}\in \{-1,1\}^M$ with where we have defined the random vector $\boldsymbol{s}\in \{-1,1\}^M$ whose entries are i.i.d. random variables with mean

Figures (5)

  • Figure 1: Learning performance with a finite $M$ dataset size. Left: System's magnetization, i.e. the overlap between teacher and student pattern is evaluated as a function of the temperature $\beta^{-1}$. The overlap increases with the number of examples $M$ as long as the system is below the critical temperature. Right: System's free energy as a function of $\beta^{-1}$. The free energy corresponding to solutions with $m>0$ are painted in solid lines, while the ergodic (E) free energy, i.e. the one corresponding to $m=0$, appears with a dashed line. As long as $\beta^{-1}<1$, the global minimum of the free energy is the state where the machine can learn the original pattern.
  • Figure 2: Learning performance with a noisy ($\beta<1$) but extensive ($M=\gamma N$) dataset on the Nishimori line. Left: the system's magnetisation $m$ is shown as a function of the inverse temperature $\beta$ and for different dataset size $\gamma$. Right: the magnetisation $m$ is shown as a function of $\gamma$ and different inverse temperatures $\beta$. The inferred pattern's quality displays a second order phase transition. Moreover it increases with $\gamma$ and decreases with the dataset noise $\beta^{-1}$.
  • Figure 3: Phase diagram of the model on the Nishimori line. For $\beta^{-1} > 1+\sqrt{\gamma}$, the student machine is in the paramagnetic phase with $m = 0$, where learning is impossible. Conversely it enters a learning phase where it can infer the original pattern by generalization from a sea of corrupted examples that the teacher provides. For $\beta^{-1}<1$ each example is highly informative and the learning performance is optimal ($m=1$).
  • Figure 4: Phase diagram of the model in the case of mismatched setting and finite $M$ ($M=1$ on the left, $M=3$ on the right) in terms of the dataset information $\hat{\beta}$ and the inference temperature $\beta^{-1}$. According to the values of $m$ and $\bm{p}$ solutions of Eqs. (\ref{['eq:m_mism']},\ref{['eq:self_mism']}) four different regimes appear: in the paramagnetic (P) regime $m=0$, $\bm{p}=0$; in the example retrieval (eR) regime $\bm{p}\neq \bm{0}$ but $m=0$; in the signal retrieval (sR) regime $\bm{p}=\bar{p}\bm{1}$ is homogeneous and $m>0$; in the mixed retrieval (mR) regime it is $\bm{p}\neq\bm{0}$ and $m>0$. Only in the sR and mR regimes the machine can learn the original signal and the learning performance monotonically increases with $\hat{\beta}$. The Nishimori line $\hat{\beta}=\beta$ is shown in green.
  • Figure 5: Phase diagram of the model in the mismatched setting, where $\hat{\beta} \neq \beta$, and extensive dataset $M=\gamma N$. In the paramagnetic (P) and spin glass (SG) regions learning is impossible. For higher values of the dataset size, the machine enters a signal retrieval region (sR) where it learns by generalizations. In this region the learning performance $m$ has a maximum (dot-dash line) for a specific value of the inference temperature. In particular if $\beta^{-1}$ gets too low the machine enters the example retrieval (eR) region where it is forced to work by memorization when this approach is inefficient for learning. The dotted line is the Nishimori condition $\beta=\hat{\beta}$.

Theorems & Definitions (28)

  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Conjecture 1
  • Conjecture 2
  • Proposition 5
  • proof
  • Proposition 6
  • ...and 18 more