Table of Contents
Fetching ...

Networks of neural networks: more is different

Elena Agliari, Andrea Alessandrelli, Adriano Barra, Martino Salomone Centonze, Federico Ricci-Tersenghi

TL;DR

The paper investigates whether a modular network of Hopfield models can realize pattern disentanglement—separating constituent signals from mixtures—by coupling layers with imitative intra-layer and anti-imitative inter-layer Hebbian interactions. Using replica-symmetric statistical-mechanics (RS) and Guerra interpolation, it derives a tractable description in the low-storage limit, and analyzes fixed points and Hessian stability to identify conditions under which the input spurious state can be driven into the desired set of patterns across layers. Numerical solutions of the RS saddle-point equations and Monte Carlo simulations validate a disentangling region in parameter space, showing that a moderate amount of noise is essential to avoid stable spurious states and achieve robust separation. The findings demonstrate a concrete manifestation of the 'more is different' principle in neural networks and suggest avenues for improving disentanglement through higher-order inter-layer couplings and RBM-like dual representations, with potential applications in signal separation for complex data.

Abstract

The common thread behind the recent Nobel Prize in Physics to John Hopfield and those conferred to Giorgio Parisi in 2021 and Philip Anderson in 1977 is disorder. Quoting Philip Anderson: "more is different". This principle has been extensively demonstrated in magnetic systems and spin glasses, and, in this work, we test its validity on Hopfield neural networks to show how an assembly of these models displays emergent capabilities that are not present at a single network level. Such an assembly is designed as a layered associative Hebbian network that, beyond accomplishing standard pattern recognition, spontaneously performs also pattern disentanglement. Namely, when inputted with a composite signal -- e.g., a musical chord -- it can return the single constituting elements -- e.g., the notes making up the chord. Here, restricting to notes coded as Rademacher vectors and chords that are their mixtures (i.e., spurious states), we use tools borrowed from statistical mechanics of disordered systems to investigate this task, obtaining the conditions over the model control-parameters such that pattern disentanglement is successfully executed.

Networks of neural networks: more is different

TL;DR

The paper investigates whether a modular network of Hopfield models can realize pattern disentanglement—separating constituent signals from mixtures—by coupling layers with imitative intra-layer and anti-imitative inter-layer Hebbian interactions. Using replica-symmetric statistical-mechanics (RS) and Guerra interpolation, it derives a tractable description in the low-storage limit, and analyzes fixed points and Hessian stability to identify conditions under which the input spurious state can be driven into the desired set of patterns across layers. Numerical solutions of the RS saddle-point equations and Monte Carlo simulations validate a disentangling region in parameter space, showing that a moderate amount of noise is essential to avoid stable spurious states and achieve robust separation. The findings demonstrate a concrete manifestation of the 'more is different' principle in neural networks and suggest avenues for improving disentanglement through higher-order inter-layer couplings and RBM-like dual representations, with potential applications in signal separation for complex data.

Abstract

The common thread behind the recent Nobel Prize in Physics to John Hopfield and those conferred to Giorgio Parisi in 2021 and Philip Anderson in 1977 is disorder. Quoting Philip Anderson: "more is different". This principle has been extensively demonstrated in magnetic systems and spin glasses, and, in this work, we test its validity on Hopfield neural networks to show how an assembly of these models displays emergent capabilities that are not present at a single network level. Such an assembly is designed as a layered associative Hebbian network that, beyond accomplishing standard pattern recognition, spontaneously performs also pattern disentanglement. Namely, when inputted with a composite signal -- e.g., a musical chord -- it can return the single constituting elements -- e.g., the notes making up the chord. Here, restricting to notes coded as Rademacher vectors and chords that are their mixtures (i.e., spurious states), we use tools borrowed from statistical mechanics of disordered systems to investigate this task, obtaining the conditions over the model control-parameters such that pattern disentanglement is successfully executed.

Paper Structure

This paper contains 17 sections, 84 equations, 13 figures, 3 algorithms.

Figures (13)

  • Figure 1: Schematic representation of the model under study, where we set $L=3$. The three contributions making up the cost function \ref{['eq:HamHam']} are highlighted: imitative intra-layer interactions (represented by a $\oplus$ loop), anti-imitative inter-layer interactions (represented by a $-\lambda$ double arrow) and the coupling with an external field (represented by a single arrow $\boldsymbol h$).
  • Figure 2: We initialize the system in the configuration $\bm \sigma^{(1,2,3)}$ (first line), $\bm \sigma^{(1,1,1)}$ (second line), $\bm \sigma^{(1,1,1')}$ (third line), and $\bm \sigma^{(h)}$ (fourth line) and we evaluate analytically the related one-step magnetization \ref{['eq:magn_evolv']}, thereby deriving the stability region (black line) in the $(H,\lambda)$ plane for that solution. Specifically, these lines are obtained by setting $\gamma=0.01$ and by determining in which region of the plane the one-step magnetization (i.e., the error functions in, respectively, eqs. \ref{['eq:erf_123']}, \ref{['eq:erf_111']}, \ref{['eq:erf_11a']}, \ref{['eq:erf_11b']}, \ref{['eq:erf_h']}), exceed a certain threshold, which we set to $0.95$; for $\bm \sigma^{(h)}$ no boundaries are detected in the region under consideration. The shade in the color accounts for the energy associated to the related fixed point: the smaller the energy and the darker the color, see also eqs. \ref{['eq:HHH0']}, \ref{['eq:HHH']}, \ref{['eq:HHH11_1']}, \ref{['eq:HHH1']}. Thus, for small $H$, although $\bm \sigma^{(1,2,3)}$ turns out to be stable, its energy is relatively close to zero. These analytical predictions are validated against computational results in order to assess the configuration stabilities versus small perturbations. To this purpose, we initialize the system in a configuration obtained from $\bm \sigma^{(1,2,3)}$ (first line), from $\bm \sigma^{(1,1,1)}$ (second line), from $\bm \sigma^{(1,1,1')}$ (third line), and from $\bm \sigma^{(h)}$ (fourth line), by flipping randomly its entries: the flip is implemented by multiplying each neuron variable $\sigma_i^a$ by a random variable $\chi_i^a$ drawn from $P(\chi)= \frac{1+r}{2}\delta(\chi-1)+\frac{1-r}{2}\delta(\chi+1)$, where $r=1.0$ (left column), $r=0.8$ (middle column), and $r=0.5$ (right column), clearly, the larger $r$ and the closer the initial configuration to the reference. Then, we implement the dynamics \ref{['eq:evolv']} with $T=0$, up to convergence to a fixed point. This is repeated for several choices of the parameters $H$ and $\lambda$ sampled uniformly in, respectively, $[0,2]$ and $[0,0.5]$ and for fixed $N=5000$ and $K=50$. Different final states are recorded and represented by different symbols and colors, as reported by the legend on the right: $\bm \sigma^{(1,2,3)}$ (green $\times$), $\bm \sigma^{(1,1,1)}$ (blue $+$), $\bm \sigma^{(1,1,1')}$ (magenta $\circ$), $\bm \sigma^{(h)}$ (red $\triangle$), or none of those considered in this section (gray $\bullet$). The patterns presented in the figure are just for illustrative purposes as both analytical and numerical results are obtained for a Rademacher dataset; for an analysis involving structured data we refer to Sec. \ref{['sec:MC']}.
  • Figure 3: The solid lines represent the numerical solution of the self-consistency equations \ref{['eq:self']} in the low-load regime and in the absence of external field, obtained by applying the fixed-point iteration method with initial point given by $\bar{\boldsymbol m}_{\{\mu \leq L\}}^{(1,2,3)}$ (left) and by $\bar{\boldsymbol m}_{\{\mu\leq L\}}^{(h)}$ (right), see eqs. \ref{['eq:mm1']}-\ref{['eq:mm2']}. These numerical solutions preserve the structure of the initial datum, specifically, on the left, the solid lines show the behavior of $\bar{m}_1^1=\bar{m}_2^2=\bar{m}_3^3$ while $\bar{m}_{\mu}^a$ is vanishing for $\mu \neq a$; on the right, the the solid lines show the behavior of $\bar{m}_{\mu}^{a}$, that coincides for any $a \in [1,2,3]$ and $\mu \in [1,2,3]$. The persistency in the structure of the solution is lost at a certain value of $\beta^{-1}$, highlighted by the vertical dotted lines: beyond these values, that depend on $\lambda$ (see the common legend on the right), solutions with a different structure appear, and these correspond, for instance, to the state $\boldsymbol \sigma^{(1,1,1')}$.
  • Figure 4: Both panels present the range in the parameter space $(\beta, \lambda, H=0)$ where the three-layer model is expected to work as pattern disentangler. Below the red line the target configuration $\boldsymbol \sigma^{(1,2,3)}$ is stable, while above the green line the spurious configuration $\boldsymbol \sigma^{(h)}$ is unstable. The two lines are found by studying the sign of the Hessian $D_{\mu \nu}^{aa}$, obtained for $N \to \infty$ and $\gamma=0$, as reported in Sec. \ref{['sec:LLNoisy']}and App. \ref{['sec:spectrum']}. The dashed lines are found by solving the self-consistency equations \ref{['eq:self']}, by the fixed-point iteration method, starting from $\boldsymbol \sigma^{(h)}$, as explained in Sec. \ref{['sec:ssr']}. More precisely, in the region between the two dashed curves, the solution found in this way corresponds to $\boldsymbol \sigma^{(1,2,3)}$, therefore in that region we expect that the machine can successfully work. Notice that the region determined by this method is, consistently, within the region outlined by stability analysis and, since it is derived from the self-consistency equations holding under the RS assumption and in the thermodynamics limit, it is expected to be subject to the same conditions. As a final test, useful to check possible finite-size corrections, we run MC simulations with a network made of $N=5000$ neurons and $K=5$ patterns, by initializing the system in the configuration $\boldsymbol \sigma^{(h)}$, updating it according to \ref{['eq:evolv']}, and keeping track of whether the stable state corresponds or, still, it is strongly correlated with, $\boldsymbol \sigma^{(1,2,3)}$: if the experimental magnitudes $m_1^1$, $m_2^2$, and $m_3^3$ (or suitable permutations) are simultaneously larger than $0.99$ (left panel) or than 0.95 (right panel), the experiment is considered successful. Such trial is repeated $50$ times, for several choices of the parameters $\beta$ and $\lambda$, estimating the accuracy as the fraction of successful trails versus the number of trials (see the colormap). We remark that an overall very good agreement among the theoretical predictions and the numerical outcomes is obtained.
  • Figure 5: We estimate the region in plane $(\beta, \lambda)$, where the three-layer model is expected to successfully disentangle mixtures of three patterns by solving the self-consistent equations \ref{['eq:self']} (dashed lines) and by running MC simulations (color map), in analogy to Figure \ref{['fig:allyoucan']}; in both cases we considered several values of the external field $H=0.0$ (left column), $H=0.1$ (middle column), $H=0.2$ (right column), and two different thresholds on the magnetizations $m>0.95$ (upper row), $m>0.99$ (lower row). For the first method, we set $\gamma =0$ and, as explained in Sec. \ref{['sec:ssr']}, we found a region, bounded by the dashed lines, where the input configuration $\boldsymbol \sigma^{(h)}$ is attracted by the target output configuration $\boldsymbol \sigma^{(1,2,3)}$, thus within that region the system is expected to accomplish pattern disentanglement. For the second method, we set $N=5000$ and $K=5$, we initialize the system in the configuration $\boldsymbol \sigma^{(h)}$ and run the noisy dynamics \ref{['eq:evolv']} up to convergence to a stationary state. Then, the magnetizations of the three layers versus the patterns $\boldsymbol \xi^1$, $\boldsymbol \xi^2$, $\boldsymbol \xi^3$, are evaluated and if each of the three patterns is retrieved with a quality at least equal to the given threshold (no matter which layer retrieves a certain pattern), the disentanglement achieved in that simulation is considered as successful. The accuracy is finally evaluated over the sample of $50$ trials and represented by the color map.
  • ...and 8 more figures