Table of Contents
Fetching ...

Overlap Gap and Computational Thresholds in the Square Wave Perceptron

Marco Benedetti, Andrej Bogdanov, Enrico M. Malatesta, Marc Mézard, Gianmarco Perrupato, Alon Rosen, Nikolaj I. Schwartzbach, Riccardo Zecchina

TL;DR

The emergence of an overlap gap at a threshold αOGP(δ), which can be made arbitrarily small by suitably increasing the frequency of oscillations 1/δ of the activation, suggests that in this small-δ regime, typical instances of the problem are hard to solve even for small values of α.

Abstract

Square Wave Perceptrons (SWPs) form a class of neural network models with oscillating activation function that exhibit intriguing ``hardness'' properties in the high-dimensional limit at a fixed constraint density $α= O(1)$. In this work, we examine two key aspects of these models. The first is related to the so-called \emph{overlap-gap property}, that is a disconnectivity feature of the geometry of the solution space of combinatorial optimization problems proven to cause the failure of a large family of solvers, and conjectured to be a symptom of algorithmic hardness. We identify, both in the storage and in the teacher-student settings, the emergence of an overlap gap at a threshold $α_{\mathrm{OGP}}(δ)$, which can be made arbitrarily small by suitably increasing the frequency of oscillations $1/δ$ of the activation. This suggests that in this small-$δ$ regime, typical instances of the problem are hard to solve even for small values of $α$. Second, in the teacher-student setup, we show that the recovery threshold of the planted signal for message-passing algorithms can be made arbitrarily large by reducing $δ$. These properties make SWPs both a challenging benchmark for algorithms and an interesting candidate for cryptographic applications.

Overlap Gap and Computational Thresholds in the Square Wave Perceptron

TL;DR

The emergence of an overlap gap at a threshold αOGP(δ), which can be made arbitrarily small by suitably increasing the frequency of oscillations 1/δ of the activation, suggests that in this small-δ regime, typical instances of the problem are hard to solve even for small values of α.

Abstract

Square Wave Perceptrons (SWPs) form a class of neural network models with oscillating activation function that exhibit intriguing ``hardness'' properties in the high-dimensional limit at a fixed constraint density . In this work, we examine two key aspects of these models. The first is related to the so-called \emph{overlap-gap property}, that is a disconnectivity feature of the geometry of the solution space of combinatorial optimization problems proven to cause the failure of a large family of solvers, and conjectured to be a symptom of algorithmic hardness. We identify, both in the storage and in the teacher-student settings, the emergence of an overlap gap at a threshold , which can be made arbitrarily small by suitably increasing the frequency of oscillations of the activation. This suggests that in this small- regime, typical instances of the problem are hard to solve even for small values of . Second, in the teacher-student setup, we show that the recovery threshold of the planted signal for message-passing algorithms can be made arbitrarily large by reducing . These properties make SWPs both a challenging benchmark for algorithms and an interesting candidate for cryptographic applications.

Paper Structure

This paper contains 31 sections, 164 equations, 13 figures.

Figures (13)

  • Figure 1: Factor graph representing the constraint satisfaction problem defined by \ref{['eq:conditions']}. Circles represent the variables, that in this case are the weights $w_i$, and squares represent the constraints, given by the input-output associations $(\boldsymbol{x}^{\mu},y^{\mu})$.
  • Figure 2: Satisfiability and teacher thresholds of the SWP, respectively $\alpha_c(\delta),\alpha_T(\delta)$, as a function of $\delta$. The continuous line starting from the bottom right represents the RS computation of $\alpha_c(\delta)$. Note that for $\delta\rightarrow \infty$ one recovers the capacity of ABP $\alpha_c^{ABP}\approx 0.833$, for which the RS prediction has been proven to be exact. The continuous line starting from top right represents the teacher threshold $\alpha_T(\delta)$, which in the planted ensemble corresponds to the value of $\alpha$ s.t. for $\alpha$ larger the teacher is the unique solution to the problem. For $\delta\rightarrow\infty$ the teacher threshold tends to the ABP one, $\alpha_T^{ABP}\approx 1.245$. The dashed line starting from top right is the annealed computation of the teacher threshold.
  • Figure 3: (a): Storage capacity $\alpha_{sat}$, and annealed $m$-OGP thresholds of the SWP as a function of $\delta$. In particular, for each $\delta$, starting from top to bottom, the colored curves represent the annealed $m$-OGP thresholds for $m=2,\dots,m^{\star}(\delta)$, where $m^{\star}(\delta)=\min{\{15,\text{argmin}_m\{\alpha_{\mathrm{OGP}}(m)\}\}}$. The dots at $\delta=0$ correspond to $1/m$. The black curve at the bottom represents $\min_m \widetilde{\alpha}^{ann}_{OGP}(m,\delta)$ (see Sec. \ref{['sec:ogpStor']}). Inset: for large $\delta$, $\alpha_{c}$ reaches a plateau (dashed line) corresponding to the storage capacity of the asymmetric binary perceptron $\alpha_{c}^{ABP}\approx 0.833$ (see KrauthMezard1989). (b): Same as left panel, with the addition of the dashed lines, which represent the replica symmetric corrections to the annealed $m$-OGP estimates for $m=2,3,4$. For small values of $\delta$ the RS predictions collapse on the annealed curves. For large $\delta$, the RS threshold deviate from the annealed ones.
  • Figure 4: (a): First-moment upper bound of $\alpha_{\mathrm{OGP}}$ in the SWP for different values of $\delta$ in the storage setting. From top to bottom $\delta=1.5,1.2,1.1,1$. The true $\alpha_{\mathrm{OGP}}$ must be a non-increasing function of $m$. The fact that the first moment estimates are non-monotone, implies that the annealed computation cannot be exact. Note that the minimum shifts to larger values of $m$ upon decreasing $\delta$. (b): Annealed (upper curve) and replica symmetric (lower dashed curve) estimates of the $\alpha_{OGP}(m)$ thresholds as a function of $m$ for $\delta=1.5$. The replica symmetric ansatz does not cure the unphysical non-monotonicity of the annealed estimates of the thresholds.
  • Figure 5: (a): average number of iterations $T_{sol}$ as a function of $\alpha$, for different values of $\delta$. From right to left $\delta=1.5,1.2,1.1,1,0.6$. The behavior of $T_{sol}$ is compatible with $T_{sol}\sim \mathrm{e}^{\frac{a}{\alpha_{alg}(\delta)-\alpha}}$, implying that for small $\alpha_{alg}(\delta)-\alpha$, $\log^{-1}{(r_{opt})}\propto \alpha-\alpha_{alg}(\delta)$. This allows to estimate the algorithmic thresholds $\alpha_{alg}(\delta)$ (see the inset of \ref{['fig:DeltaAlpha']}). Data are obtained averaging over $80$ samples of size $N=4000$. (b): difference $\Delta\alpha(\delta)$ between the optimum value of the RS estimate of the $m$-OGP threshold, namely $\min_{m}\alpha_{OGP}^{rs}(m)$, and the algorithmic thresholds estimated in figure $\emph{(a)}$. Inset: $\log^{-1}{(T_{sol})}$ as a function of $\alpha$ for different $\delta$. From right to left $\delta=1.5,1.2,1.1,1,0.6$. Dashed lines are linear fit of the last points.
  • ...and 8 more figures