Table of Contents
Fetching ...

Supervised and Unsupervised protocols for hetero-associative neural networks

Andrea Alessandrelli, Adriano Barra, Andrea Ladiana, Andrea Lepre, Federico Ricci-Tersenghi

TL;DR

This work develops a three-directional associative memory (TAM) framework with three binary layers and $K$ archetypes, learned under supervised and unsupervised Hebbian protocols. It leverages replica-symmetric analysis and Guerra interpolation to derive exact self-consistency equations for order parameters and to map retrieval vs. noise phase diagrams, validated by Monte Carlo simulations on random and structured data. A key finding is layer cooperativity: information-rich layers can bolster poorer ones, expanding the retrieval region beyond layer-local noise constraints. The results illuminate principled design guidelines for hetero-associative architectures and connect to neurobiological concepts of pattern completion and pattern separation, with potential cross-disciplinary insights for AI and neuroscience.

Abstract

This paper introduces a learning framework for Three-Directional Associative Memory (TAM) models, extending the classical Hebbian paradigm to both supervised and unsupervised protocols within an hetero-associative setting. These neural networks consist of three interconnected layers of binary neurons interacting via generalized Hebbian synaptic couplings that allow learning, storage and retrieval of structured triplets of patterns. By relying upon glassy statistical mechanical techniques (mainly replica theory and Guerra interpolation), we analyze the emergent computational properties of these networks, at work with random (Rademacher) datasets and at the replica-symmetric level of description: we obtain a set of self-consistency equations for the order parameters that quantify the critical dataset sizes (i.e. their thresholds for learning) and describe the retrieval performance of these networks, highlighting the differences between supervised and unsupervised protocols. Numerical simulations validate our theoretical findings and demonstrate the robustness of the captured picture about TAMs also at work with structured datasets. In particular, this study provides insights into the cooperative interplay of layers, beyond that of the neurons within the layers, with potential implications for optimal design of artificial neural network architectures.

Supervised and Unsupervised protocols for hetero-associative neural networks

TL;DR

This work develops a three-directional associative memory (TAM) framework with three binary layers and archetypes, learned under supervised and unsupervised Hebbian protocols. It leverages replica-symmetric analysis and Guerra interpolation to derive exact self-consistency equations for order parameters and to map retrieval vs. noise phase diagrams, validated by Monte Carlo simulations on random and structured data. A key finding is layer cooperativity: information-rich layers can bolster poorer ones, expanding the retrieval region beyond layer-local noise constraints. The results illuminate principled design guidelines for hetero-associative architectures and connect to neurobiological concepts of pattern completion and pattern separation, with potential cross-disciplinary insights for AI and neuroscience.

Abstract

This paper introduces a learning framework for Three-Directional Associative Memory (TAM) models, extending the classical Hebbian paradigm to both supervised and unsupervised protocols within an hetero-associative setting. These neural networks consist of three interconnected layers of binary neurons interacting via generalized Hebbian synaptic couplings that allow learning, storage and retrieval of structured triplets of patterns. By relying upon glassy statistical mechanical techniques (mainly replica theory and Guerra interpolation), we analyze the emergent computational properties of these networks, at work with random (Rademacher) datasets and at the replica-symmetric level of description: we obtain a set of self-consistency equations for the order parameters that quantify the critical dataset sizes (i.e. their thresholds for learning) and describe the retrieval performance of these networks, highlighting the differences between supervised and unsupervised protocols. Numerical simulations validate our theoretical findings and demonstrate the robustness of the captured picture about TAMs also at work with structured datasets. In particular, this study provides insights into the cooperative interplay of layers, beyond that of the neurons within the layers, with potential implications for optimal design of artificial neural network architectures.

Paper Structure

This paper contains 15 sections, 2 theorems, 195 equations, 11 figures, 2 algorithms.

Key Result

Lemma A.1

Given the explicit expression of $\hat{\xi}^1$eq:rho_def the following expectation can be simply approximated using the CLT on the sum over $a$, under the assumption of $M_1\gg 1$, namely

Figures (11)

  • Figure 1: (left) Representation of the TAM neural network as given by Eq. \ref{['eq:hamTAM']}. In this depiction, each layer is composed of binary (Ising) units interacting pairwise in a generalized Hebbian fashion (see Eqs. \ref{['sinapsigmatau']}–\ref{['sinapsitauphi']} and the cost function \ref{['eq:hamTAM']}). In the illustrated example, the layer $\bm\sigma$ comprises $N_1=4$ neurons, the layer $\bm\tau$$N_2=3$ neurons, and the layer $\bm\phi$$N_3=2$ neurons. (right) Integral representation of the TAM neural network TAMstoring as provided by Eq. \ref{['eq:rappresentazioneintegrale_sup']}. In this formulation, the three visible layers (depicted on the left) are no longer directly interconnected; instead, each is coupled with a corresponding hidden layer that governs further interactions. Notably, while the visible layers consist of standard binary neurons, the hidden layers are composed of highly selective “grandmother” units grandmotheragliariEmergence that activate solely when the pattern they encode for is either presented or reconstructed on the associated visible layer, thereby facilitating the retrieval of pattern triples.
  • Figure 2: Phase diagrams of the TAM network in the supervised setting in the noise versus storage plane at $\alpha = \theta = 1$, obtained by solving numerically equations \ref{['seldefinitive']} and \ref{['seldefinitive1']} (only for the $\sigma$ layer, since the network is symmetric). The analysis includes different inter-layer interaction forces and various values of the entropy of the $\rho$ dataset, as indicated in the titles and legends. Each blue solid line represents the phase transition for the entire network, dividing the working region (bottom left)—where archetypes are learned and thus can be recovered and generalized—from the blackout region (top right), where spin glass effects prevail, for a specific value of the entropy of the dataset, i.e., $\rho_1=\rho_2=\rho_3=\rho$. The retrieval region is determined by the conditions $|m^\sigma_{\xi_1^1}|, |m^\tau_{\xi_1^1}|, |m^\phi_{\xi_1^1}| \sim 1$: these constraints are all simultaneously satisfied in the region below the solid line, while above it all magnetization vanishes. The influence of $\rho$ is clearly visible: as $\rho$ increases, the recovery region gets progressively narrower in all diagrams. For $\rho = 0$, we recover the results of the standard BAM case of Kosko kosko1988bidirectional (first panel) and the new ones related to TAM TAMstoring (second and third panels). For simplicity and symmetry considerations, only the $\bm \sigma$ layer is shown. In the insets of each plot: MC simulation at zero-fast noise ($\beta^{-1} = 0$) with a symmetric network ($N_1 = N_2 = N_3 = 400$), showing the evolution of the Mattis magnetizations $m_{\xi_1}$ across the layers as a function of network load ($\gamma$) for different $\rho$. The simulations agree with theoretical predictions, correctly depicting the maximum load beyond which the network stops functioning.
  • Figure 3: Capacity, connectivity, and efficiency in TAM networks as functions of asymmetry. This figure explores how the asymmetry parameters $\alpha$ and $\theta$ shape the performance of TAM networks. The left panel shows that the maximum load $\gamma_{max}$ is attained in the symmetric case $(\alpha, \theta) = (1,1)$, confirming that balanced architectures favor memory capacity. The central panel reports the number of synapses $|E_3|$, which also peaks under symmetry due to maximal inter-layer connectivity. Interestingly, the right panel, showing the ratio $\gamma_{max}/|E_3|$ as a measure of synaptic efficiency, reveals that the most efficient configuration occurs not in the fully symmetric regime, but at a slightly asymmetric point around $\alpha = \theta \approx 1.92$. This indicates that a mild asymmetry can provide the best trade-off between capacity and synaptic cost.
  • Figure 4: Numerical solution of the self-consistency Eqs. \ref{['eq:selfMagnSigma']}-\ref{['eq:selfQPhi']} for $\left(\alpha,\theta\right)=(1,1)$ in different control parameter planes. (left) The solution is represented in the $(\gamma,\beta^{-1})$ plane with datasets entropies $r_1=r_2=r_3=0.3$ and $M=100$, and $(g_{\sigma\tau},g_{\sigma\phi},g_{\tau\phi})=(1,1,1)$. (right) The solution is shown in the $(\rho,\beta^{-1})$ plane with $\gamma=0.1$. In both cases, we depict only the $\bm\sigma$ layer due to network symmetry reasons. The black dashed line indicates the boundary between two regimes: one where the sigma layer magnetization is greater than $0.1$ and the other where it is less than $0.1$. The red line marks where $\bm\sigma$-layer magnetization exceeds $0.9$ and falls below $0.9$. Unlike the supervised case, where a clear phase transition can be observed, in the unsupervised setting — due to the approximations involved in solving the model (see App. \ref{['appsec:proofGuerra']}) — such interpretability is not preserved.
  • Figure 5: Monte Carlo simulation of a symmetric $(\alpha =1, \ \theta=1)$ TAM network. The number of archetypes is fixed at $K= 40$ for each layer, with $M_1=M_2=M_3 =20$. Each curve in each plot is displayed in different shades of blue, corresponding to varying numbers of neurons (i.e., $N=N_1=N_2=N_3 \in \{100,400,800,1600\}$), while in orange, the same curves are shown in the limit $\gamma\to 0$. Each plot represents a distinct set of MC simulations that differ only in the quality of the dataset, denoted by $r$: specifically, from left to right, we set $\bm r=(0.9, 0.6, 0.3)$. The threshold value (critical temperature), beyond which magnetization drops sharply as the temperature increases, can be determined from the phase diagram in the thermodynamic limit $N_1 \to \infty$. Due to the symmetry of the network, we present the results for only one layer, $\bm\sigma$. The larger the bars, the greater the errors in the results, evaluated as the standard deviation over 100 independent runs.
  • ...and 6 more figures

Theorems & Definitions (7)

  • Remark 1
  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma A.1
  • proof
  • Lemma A.2