The Copycat Perceptron: Smashing Barriers Through Collective Learning

Giovanni Catania; Aurélien Decelle; Beatriz Seoane

The Copycat Perceptron: Smashing Barriers Through Collective Learning

Giovanni Catania, Aurélien Decelle, Beatriz Seoane

TL;DR

It is found that the coupling of replicas leads to a bend of the phase diagram towards smaller values of α, which suggests that the free entropy landscape gets smoother around the solution with perfect generalization, allowing standard thermal updating algorithms to easily reach the teacher solution and avoid getting trapped in metastable states.

Abstract

We characterize the equilibrium properties of a model of $y$ coupled binary perceptrons in the teacher-student scenario, subject to a suitable cost function, with an explicit ferromagnetic coupling proportional to the Hamming distance between the students' weights. In contrast to recent works, we analyze a more general setting in which thermal noise is present that affects each student's generalization performance. In the nonzero temperature regime, we find that the coupling of replicas leads to a bend of the phase diagram towards smaller values of $α$: This suggests that the free entropy landscape gets smoother around the solution with perfect generalization (i.e., the teacher) at a fixed fraction of examples, allowing standard thermal updating algorithms such as Simulated Annealing to easily reach the teacher solution and avoid getting trapped in metastable states as it happens in the unreplicated case, even in the computationally \textit{easy} regime of the inference phase diagram. These results provide additional analytic and numerical evidence for the recently conjectured Bayes-optimal property of Replicated Simulated Annealing (RSA) for a sufficient number of replicas. From a learning perspective, these results also suggest that multiple students working together (in this case reviewing the same data) are able to learn the same rule both significantly faster and with fewer examples, a property that could be exploited in the context of cooperative and federated learning.

The Copycat Perceptron: Smashing Barriers Through Collective Learning

TL;DR

Abstract

We characterize the equilibrium properties of a model of

coupled binary perceptrons in the teacher-student scenario, subject to a suitable cost function, with an explicit ferromagnetic coupling proportional to the Hamming distance between the students' weights. In contrast to recent works, we analyze a more general setting in which thermal noise is present that affects each student's generalization performance. In the nonzero temperature regime, we find that the coupling of replicas leads to a bend of the phase diagram towards smaller values of

: This suggests that the free entropy landscape gets smoother around the solution with perfect generalization (i.e., the teacher) at a fixed fraction of examples, allowing standard thermal updating algorithms such as Simulated Annealing to easily reach the teacher solution and avoid getting trapped in metastable states as it happens in the unreplicated case, even in the computationally \textit{easy} regime of the inference phase diagram. These results provide additional analytic and numerical evidence for the recently conjectured Bayes-optimal property of Replicated Simulated Annealing (RSA) for a sufficient number of replicas. From a learning perspective, these results also suggest that multiple students working together (in this case reviewing the same data) are able to learn the same rule both significantly faster and with fewer examples, a property that could be exploited in the context of cooperative and federated learning.

Paper Structure (10 sections, 33 equations, 7 figures)

This paper contains 10 sections, 33 equations, 7 figures.

Derivation of Quenched free entropy
Replica symmetric ansatz on both spaces
Saddle point equations
Effect of $\gamma$
MF theory of Single Perceptron
RS free entropy
1-RSB free entropy
Dynamic Transition line
Simulated Annealing's implementation details and additional numerical results
Approximate Message Passing

Figures (7)

Figure 1: $\left(\alpha,T\right)$-Equilibrium phase diagram of the single, non-replicated, binary perceptron (top panel) and the replicated perceptron according to Eq. \ref{['eq:HamiltonianCoupledPerceptrons']} for a fixed coupling $\gamma\!=\!1$ and $y$ multiple replicas (bottom panel). The color shades refer respectively to the impossible (red), hard (yellow), easy (green) inference phases as extrapolated from $T\!=\!0$. The (a)-(e) regions are discussed in the text.
Figure 2: Numerical performances of replicated SA (RSA) with an annealing rate $\eta=10^{-5}$. In black lines we show the performance of the single perceptron and in colors that of the coupled perceptron with different $y$. (a): empirical probability (from $100$ training instances) of finding the teacher configuration at the end of the annealing process, as a function of $\alpha$, for $N\!=\!2001$, $\gamma\!=\!1$. The inset (b) shows the corresponding mean generalization error $\varepsilon_g = \pi^{-1}\text{acos}(R)$. The colored regions are the same as in Fig. \ref{['fig:DP']}: in particular, the hard-easy phase boundary corresponds to the algorithmic threshold for AMP at $T=0$. (c) and (d): Examples of typical annealing trajectories for $2$ values of $\alpha$ ($10$ training trajectories are shown for each $y$). The settings and color coding are the same as in the top panel. The white outlined lines show the analytical result obtained from solving the RS self-consistent equations, starting from a poorly generalizing solution ($R\!\ll\! 1$) at $T\!=\!0.5$, and following the fixed point as the temperature is linearly decreased.
Figure S1: Schematic representation of the model defined by Eq. (\ref{['eq:HamiltonianCoupledPerceptrons']}) for $y=4$ student perceptrons interacting over a fully connected graph. The dashed blue lines represent the coupling $\gamma$ between students.
Figure S2: Phase diagram of $y=2$ coupled binary perceptrons in the $\left(T-\alpha\right)$ plane at different values of $\gamma$ (values given in the legend). Comparison with the single perceptron in the RS approach (the black lines).
Figure S3: Numerical performances of SA/RSA: the top row shows the empirical probability that the teacher configuration is found at the end of the SA, the bottom row shows the mean generalization error; both quantities are plotted vs of the fraction of samples $\alpha$, for a system with $N=2001$ weights, for different values of the annealing rate $\eta$ (shown at the top of each column). Comparison between the single non-replicated Perceptron (black lines) and a system of $y$ coupled Perceptrons with different values of $y$, with fixed coupling $\gamma=1$. For $y\geq 2$, the top row shows the probability that all the $y$ students find the teacher configuration at the end of the SA, and the bottom row displays the mean generalization error averaged also on the $y$ students. However, we numerically observe that, except at extremely fast annealing rates (e.g the first panel with $\eta=10^{-2}$) the students display a practically identical behavior in temperature. The right-most panels are the same shown in Fig. \ref{['fig:res_RSA']} of the main text.
...and 2 more figures

The Copycat Perceptron: Smashing Barriers Through Collective Learning

TL;DR

Abstract

The Copycat Perceptron: Smashing Barriers Through Collective Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)