Table of Contents
Fetching ...

Exact full-RSB SAT/UNSAT transition in infinitely wide two-layer neural networks

Brandon L. Annesi, Enrico M. Malatesta, Francesco Zamponi

TL;DR

The paper analyzes storage in two continuous non-convex neural models—the Tree-Committee Machine and the Negative Perceptron—using a full-RSB replica framework to compute the exact SAT/UNSAT transition at density $\alpha_c(\kappa)$. It shows that in the infinite-width limit the energetic coupling reduces to a Gaussian process with an activation-dependent kernel $\Delta(q)$, leading to a Parisi-variational problem for the order-parameter function $q(x)$ and its associated PDEs. A novel Gardner transition line is identified in the negative perceptron, separating a no-overlap-gap fRSB phase from an overlap-gap Gardner phase, with important implications for AMP-based algorithms that rely on connected overlap distributions. The work also demonstrates an algorithmic gap: gradient-based methods fail to reach the exact capacity, and iAMP’s provable guarantees require the no-overlap-gap condition, which breaks down in the Gardner phase. Collectively, these findings illuminate the intricate geometry of solution spaces in wide two-layer networks and their impact on learning dynamics and algorithmic performance, particularly in overparameterized, non-convex regimes.

Abstract

We analyze the problem of storing random pattern-label associations using two classes of continuous non-convex weights models, namely the perceptron with negative margin and an infinite-width two-layer neural network with non-overlapping receptive fields and generic activation function. Using a full-RSB ansatz we compute the exact value of the SAT/UNSAT transition. Furthermore, in the case of the negative perceptron we show that the overlap distribution of typical states displays an overlap gap (a disconnected support) in certain regions of the phase diagram defined by the value of the margin and the density of patterns to be stored. This implies that some recent theorems that ensure convergence of Approximate Message Passing (AMP) based algorithms to capacity are not applicable. Finally, we show that Gradient Descent is not able to reach the maximal capacity, irrespectively of the presence of an overlap gap for typical states. This finding, similarly to what occurs in binary weight models, suggests that gradient-based algorithms are biased towards highly atypical states, whose inaccessibility determines the algorithmic threshold.

Exact full-RSB SAT/UNSAT transition in infinitely wide two-layer neural networks

TL;DR

The paper analyzes storage in two continuous non-convex neural models—the Tree-Committee Machine and the Negative Perceptron—using a full-RSB replica framework to compute the exact SAT/UNSAT transition at density . It shows that in the infinite-width limit the energetic coupling reduces to a Gaussian process with an activation-dependent kernel , leading to a Parisi-variational problem for the order-parameter function and its associated PDEs. A novel Gardner transition line is identified in the negative perceptron, separating a no-overlap-gap fRSB phase from an overlap-gap Gardner phase, with important implications for AMP-based algorithms that rely on connected overlap distributions. The work also demonstrates an algorithmic gap: gradient-based methods fail to reach the exact capacity, and iAMP’s provable guarantees require the no-overlap-gap condition, which breaks down in the Gardner phase. Collectively, these findings illuminate the intricate geometry of solution spaces in wide two-layer networks and their impact on learning dynamics and algorithmic performance, particularly in overparameterized, non-convex regimes.

Abstract

We analyze the problem of storing random pattern-label associations using two classes of continuous non-convex weights models, namely the perceptron with negative margin and an infinite-width two-layer neural network with non-overlapping receptive fields and generic activation function. Using a full-RSB ansatz we compute the exact value of the SAT/UNSAT transition. Furthermore, in the case of the negative perceptron we show that the overlap distribution of typical states displays an overlap gap (a disconnected support) in certain regions of the phase diagram defined by the value of the margin and the density of patterns to be stored. This implies that some recent theorems that ensure convergence of Approximate Message Passing (AMP) based algorithms to capacity are not applicable. Finally, we show that Gradient Descent is not able to reach the maximal capacity, irrespectively of the presence of an overlap gap for typical states. This finding, similarly to what occurs in binary weight models, suggests that gradient-based algorithms are biased towards highly atypical states, whose inaccessibility determines the algorithmic threshold.

Paper Structure

This paper contains 42 sections, 128 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Tree committee machine architecture.
  • Figure 2: Overlap $q(x)$ for the infinite-width tree-committee machine, with ReLU non-linearity near the onset of RSB which happens at $\alpha_{\text{dAT}} \sim 1.7212$relu_locent (left panel), and near the critical capacity regime (right panel).
  • Figure 3: Minimum and maximal overlap $q_m$ and $q_M$ as a function of $\alpha$ in the case of the ReLU (left panel) and Erf activation functions (right) with $\kappa=0$. For $\alpha \le \alpha_{dAT}$, the RS ansatz is correct so $q_m = q_M$. For $\alpha \to \alpha_c$ we have that $q_M \to 1$. (Inset) We show that $q_M$ scales as a power law, see equation \ref{['eq::qM_scaling']}, with an exponent $\sigma \simeq 1.4157$. Dots are exact numerical solutions, lines are power-law fits.
  • Figure 4: Inverse reduced pressure as a function of the constraint density $\alpha$ in the case of the infinite-width tree-committee machine, with ReLU (left panel) and Erf (right) activation functions with $\kappa=0$. The blue and orange lines represents RS and 1RSB predictions. The red dots represent the solutions obtained by using $k=100$ steps of RSB. For $\alpha \to \alpha_{c}$ the inverse reduced pressure scales as $\tilde{p}^{-1} \sim \alpha - \alpha_c$. The red line represents a fit to the $k$-RSB data near the critical capacity.
  • Figure 5: Phase Diagram of the Negative Perceptron. The dynamical transition line $\alpha_{dyn}(\kappa)$ that exists for $\kappa < \kappa_{\text{RFOT}}$ is not displayed for clarity reasons, but it can be found in Baldassi2023Typical. Dashed lines represent linear interpolations of the Gardner and 1+fRSB transitions to their intersections with the dAT line which happens at $\kappa=\kappa_{1RSB}$. The dotted line represents the critical capacity evaluated with the RS ansatz.
  • ...and 7 more figures