Dense Hopfield Networks in the Teacher-Student Setting

Robin Thériault; Daniele Tantari

Dense Hopfield Networks in the Teacher-Student Setting

Robin Thériault, Daniele Tantari

TL;DR

The paper analyzes dense $p$-body Hopfield networks in a controlled teacher–student (inverse learning) setting, deriving the phase diagram for unsupervised pattern retrieval and revealing ferromagnetic regimes that correspond to prototype and feature learning. It shows that on the Nishimori line the inverse problem remains replica-symmetric and that the retrieval transition aligns with the direct model’s spin-glass transition, while outside this line larger student $p$ yields extensive noise tolerance and an explicit zero-temperature adversarial-robustness formula. The work provides exact RS results for phase boundaries, clarifies universality connections between direct and inverse problems, and explains why prototype phases exhibit adversarial robustness. These insights illuminate how increasing model capacity and aligning teacher-student parameters can enhance robustness and data efficiency in dense Hopfield networks, with potential implications for understanding modern robust representations and learning architectures.

Abstract

Dense Hopfield networks are known for their feature to prototype transition and adversarial robustness. However, previous theoretical studies have been mostly concerned with their storage capacity. We bridge this gap by studying the phase diagram of p-body Hopfield networks in the teacher-student setting of an unsupervised learning problem, uncovering ferromagnetic phases reminiscent of the prototype and feature learning regimes. On the Nishimori line, we find the critical size of the training set necessary for efficient pattern retrieval. Interestingly, we find that that the paramagnetic to ferromagnetic transition of the teacher-student setting coincides with the paramagnetic to spin-glass transition of the direct model, i.e. with random patterns. Outside of the Nishimori line, we investigate the learning performance in relation to the inference temperature and dataset noise. Moreover, we show that using a larger p for the student than the teacher gives the student an extensive tolerance to noise. We then derive a closed-form expression measuring the adversarial robustness of such a student at zero temperature, corroborating the positive correlation between number of parameters and robustness observed in large neural networks. We also use our model to clarify why the prototype phase of modern Hopfield networks is adversarially robust.

Dense Hopfield Networks in the Teacher-Student Setting

TL;DR

The paper analyzes dense

-body Hopfield networks in a controlled teacher–student (inverse learning) setting, deriving the phase diagram for unsupervised pattern retrieval and revealing ferromagnetic regimes that correspond to prototype and feature learning. It shows that on the Nishimori line the inverse problem remains replica-symmetric and that the retrieval transition aligns with the direct model’s spin-glass transition, while outside this line larger student

yields extensive noise tolerance and an explicit zero-temperature adversarial-robustness formula. The work provides exact RS results for phase boundaries, clarifies universality connections between direct and inverse problems, and explains why prototype phases exhibit adversarial robustness. These insights illuminate how increasing model capacity and aligning teacher-student parameters can enhance robustness and data efficiency in dense Hopfield networks, with potential implications for understanding modern robust representations and learning architectures.

Abstract

Paper Structure (23 sections, 72 equations, 9 figures)

This paper contains 23 sections, 72 equations, 9 figures.

Introduction
Overview of Gardner's results
Teacher-student setting
Matched interaction orders
Mismatched interaction orders
Results and Discussion
Retrieval transition at large interaction order
Transition to the ordered phases: Universality
Phase diagram on the Nishimori line
Inference temperature vs dataset noise
Interaction order and noise tolerance
Large noise scaling
Finite noise scaling
Robustness against adversarial attacks
Conclusion
...and 8 more sections

Figures (9)

Figure 1: RS phase diagrams of the direct models with $p = 3$ on the left and $p = 10$ on the right. Accurate pattern retrieval is not possible in the paramagnetic phase ($P$) or in the spin-glass phase ($SG$), but it is possible in the local retrieval phase ($lR$) and in the global retrieval phase ($gR$). The ferromagnetic fixed point corresponding to accurate pattern retrieval is globally stable in the $gR$ phase, but locally stable in the $lR$ phase. The phase diagrams are inexact below the white dashed line where the total entropy of the paramagnetic phase becomes negative. The black dotted line overlaying the $p = 3$ diagram is the (exact) 1RSB $P$-$SG$ transition temperature $T_s \left( \alpha, 3 \right)$, which is obtained by rescaling by $\sqrt{2 \alpha}$ the corresponding transition temperature of the spin-glass model with $p$-body Gaussian interactions. The d1RSB transition $T_d \left( \alpha, 3 \right)$ is very close to $T_s \left( \alpha, 3 \right)$ throughout the displayed range of $\alpha$. The white dotted line in the $p = 3$ plot is the temperature $T_G \left( \alpha, 3 \right)$ below which multiple steps of RSB are required to compute the free entropy. It is also obtained by rescaling by $\sqrt{2 \alpha}$ the corresponding transition temperature of the Gaussian spin-glass model.
Figure 2: Exact RS phase diagrams of inverse models on the Nishimori line, i.e. $p^* = p$ and $\beta^* = \beta$. the left, center and right plots respectively have $p = 3$, $p = 4$ and $p = 10$. Accurate pattern retrieval is not possible in the paramagnetic phase ($P$), but it is possible in the local retrieval phase ($lR$), in the global retrieval phase ($gR$) and in the example retrieval phase ($eR$). The ferromagnetic fixed point corresponding to accurate pattern retrieval is globally stable in the $gR$ phase, but locally stable in the $lR$ phase. The critical temperature of the $eR$ phase is the critical temperature $T_{\text{crit}}$ of the direct problem with one pattern (see Fig. \ref{['fig:direct_phase_diagrams']}, $\alpha = 0$ axis). The black dashed lines mark the spurious continuation of the $lR$ and $gR$ phase boundaries through the $eR$ phase. The white dashed line is the $p \rightarrow \infty$$gR$ critical line calculated analytically in Section \ref{['sec:retrieval']}. It matches the corresponding numerical phase boundary increasingly well as $p$ grows larger. The white dotted lines on the $p = 3$ plot mark the 1RSB and d1RSB critical temperatures $T_s \left( \alpha, 3 \right)$ and $T_d \left( \alpha, 3 \right)$ of the direct model (see Section \ref{['sec:gardner_overview']}). We truncated them below $T_{\text{crit}}$ for improved visibility. $T_s \left( \alpha, 3 \right)$ and $T_d \left( \alpha, 3 \right)$ are obtained by rescaling the corresponding critical temperatures found in montanari2003nature by $\sqrt{2 \alpha}$.
Figure 3: The first row of this diagram sketches how a $p$-body Hopfield network in the teacher-student setting can reconstruct an incomplete pattern $\xi^b$ to match the teacher pattern $\xi^*$ by relying on the examples $\sigma$ obtained from $\xi^*$. The second row summarizes how a dense neural network trained by K & H can recover the labels $y'$ of the data $x$ given the weights $w$ learned from $x$krotov2016dense. Both models tackle similar tasks using an approach where $\sigma$ and $\xi^b$ respectively play the same roles as $w$ and $(x, y')$. The forward propagation algorithm used to generate $y'$ is similar to the update rule of the student (see krotov2016dense and Appendix \ref{['app:hamiltonians']}), but the backpropagation algorithm used to learn $w$ is very different from the update rule of the teacher.
Figure 4: Monte-Carlo simulations of the $p = 3$ inverse model compared against RS saddle-point solutions. The $lR$ phase is included on the left and central plots, but not on the right one. The left plot has $\varepsilon = 0$, and the two other ones have a handpicked $\varepsilon$ such that the simulations are initalized near the saddle-point solutions. The dots are simulation data at a few values of $\alpha$, and the lines are slices of the saddle-point solutions at the same $\alpha$. The teacher generates $M = \frac{\alpha N^{p-1}}{p!}$ examples $\sigma^a$ with $N = 512$ components each, and the simulation results are then averaged over $L = 100$ student patterns. The simulation data is sometimes systematically shifted up with respect to the saddle-point solution. This difference is notably visible on the central plot, right after the fall from $eR$ to $gR$ when $\alpha = 3$.
Figure 5: RS phase diagrams of inverse models with $p^* = p$ and fixed $\beta^*$. The top and bottom rows of plots respectively have $p^* = p = 3$ and $p^* = p = 4$. In the same way, the left, central and right columns correspond to $T^* = 1.5$, $T^* = 1.6$ and $T^* = 1.7$. Accurate pattern retrieval is not possible in the paramagnetic phase ($P$), in the spin-glass phase ($SG$) or in the example retrieval phase ($eR$), but it is possible in the local retrieval phase ($lR$) and in the global retrieval phase ($gR$). The ferromagnetic fixed point corresponding to accurate pattern retrieval is globally stable in the $gR$ phase, but locally stable in the $lR$ phase. Conversely, the $SG$ fixed point is always locally stable and leads the student to a frozen spurious signal. The white dashed line indicates the Nishimori line $\beta^* = \beta$. The black dashed lined is the $gR$ phase boundary on the Nishimori line. As explained in Section \ref{['sec:match_discussion']}, we expect it to overlap the exact $SG$ phase transition.
...and 4 more figures

Dense Hopfield Networks in the Teacher-Student Setting

TL;DR

Abstract

Dense Hopfield Networks in the Teacher-Student Setting

Authors

TL;DR

Abstract

Table of Contents

Figures (9)