Table of Contents
Fetching ...

High-dimensional manifold of solutions in neural networks: insights from statistical physics

Enrico M. Malatesta

TL;DR

This work surveys a statistical-physics view of high-dimensional neural network solution spaces, focusing on the perceptron with binary or spherical weights and the SAT/UNSAT transition at fixed α = P/N. It leverages the replica method to derive Gardner volumes and RS saddle-point equations, elucidating how the landscape shrinks and restructures as constraints accumulate. It further connects geometry to learning dynamics by analyzing local entropy (Franz-Parisi) and delineating algorithmic hardness via the Overlap Gap Property, while showing how high-entropy regions can enhance robustness and generalization. Finally, it examines linear mode connectivity between solutions, identifying regimes where straight paths of zero training error exist or fail and highlighting a kernel region that supports connectivity in the overparameterized regime. Together, these results illuminate the global shape of neural-network solution manifolds and their implications for optimization and generalization.

Abstract

In these pedagogic notes I review the statistical mechanics approach to neural networks, focusing on the paradigmatic example of the perceptron architecture with binary an continuous weights, in the classification setting. I will review the Gardner's approach based on replica method and the derivation of the SAT/UNSAT transition in the storage setting. Then, I discuss some recent works that unveiled how the zero training error configurations are geometrically arranged, and how this arrangement changes as the size of the training set increases. I also illustrate how different regions of solution space can be explored analytically and how the landscape in the vicinity of a solution can be characterized. I give evidence how, in binary weight models, algorithmic hardness is a consequence of the disappearance of a clustered region of solutions that extends to very large distances. Finally, I demonstrate how the study of linear mode connectivity between solutions can give insights into the average shape of the solution manifold.

High-dimensional manifold of solutions in neural networks: insights from statistical physics

TL;DR

This work surveys a statistical-physics view of high-dimensional neural network solution spaces, focusing on the perceptron with binary or spherical weights and the SAT/UNSAT transition at fixed α = P/N. It leverages the replica method to derive Gardner volumes and RS saddle-point equations, elucidating how the landscape shrinks and restructures as constraints accumulate. It further connects geometry to learning dynamics by analyzing local entropy (Franz-Parisi) and delineating algorithmic hardness via the Overlap Gap Property, while showing how high-entropy regions can enhance robustness and generalization. Finally, it examines linear mode connectivity between solutions, identifying regimes where straight paths of zero training error exist or fail and highlighting a kernel region that supports connectivity in the overparameterized regime. Together, these results illuminate the global shape of neural-network solution manifolds and their implications for optimization and generalization.

Abstract

In these pedagogic notes I review the statistical mechanics approach to neural networks, focusing on the paradigmatic example of the perceptron architecture with binary an continuous weights, in the classification setting. I will review the Gardner's approach based on replica method and the derivation of the SAT/UNSAT transition in the storage setting. Then, I discuss some recent works that unveiled how the zero training error configurations are geometrically arranged, and how this arrangement changes as the size of the training set increases. I also illustrate how different regions of solution space can be explored analytically and how the landscape in the vicinity of a solution can be characterized. I give evidence how, in binary weight models, algorithmic hardness is a consequence of the disappearance of a clustered region of solutions that extends to very large distances. Finally, I demonstrate how the study of linear mode connectivity between solutions can give insights into the average shape of the solution manifold.
Paper Structure (18 sections, 51 equations, 9 figures)

This paper contains 18 sections, 51 equations, 9 figures.

Figures (9)

  • Figure 1: Left and middle panels: space of solutions (green shaded area) in the spherical perceptron with $\kappa = 0$ (left panel) and $\kappa < 0$ (middle). The green shaded area is obtained by taking the intersection of two spherical caps corresponding to patterns $\boldsymbol{\xi}^1$ and $\boldsymbol{\xi}^2$. Right panel: binary weight case. The cube (red) is inscribed in the sphere. The green shaded area represents the (convex) space of solutions of the corresponding spherical problem. In this simple example only two vertices of the cube are inside the green region.
  • Figure 2: Distance between solutions sampled with a given margin $\kappa$ as a function of the constrained density $\alpha$ for the spherical case (left panel) and the binary perceptron (right panel). In the binary case, the lines change from solid to dashed when the entropy of solutions becomes negative.
  • Figure 3: Binary perceptron. Left: RS entropy as a function of $\alpha$ for several values of the margin $\kappa$. Dashed lines show the nonphysical parts of the curves, where entropy is negative. The value of $\alpha$ where the entropy vanishes corresponds to the SAT/UNSAT transition $\alpha_c(\kappa)$. Right: SAT/UNSAT transition $\alpha_c(\kappa)$ obtained using the zero entropy condition \ref{['eq::zero_entropy_condition']}.
  • Figure 4: Spherical perceptron. Left: RS entropy as a function of $\alpha$ for several values of the margin $\kappa$. The point in $\alpha$ where the entropy goes to $-\infty$ (indicated by the dashed vertical lines) corresponds to the SAT/UNSAT transition $\alpha_c(\kappa)$. Right: RS SAT/UNSAT transition $\alpha_c(\kappa)$ as given by \ref{['eq::alphac_spherical']}. For $\kappa<0$ the line is dashed, to remind that the RS prediction is only an upper bound to the true one, since the model becomes non-convex.
  • Figure 5: Binary perceptron. Left: Averaged local entropy of typical solutions as a function of distance, for several values of $\alpha$. Right: Distance $d_{\text{min}}$ for which the Franz Parisi entropy $\phi_{\text{FP}}$ is zero as a function of $\alpha$. At the SAT/UNSAT transition (dashed vertical line) the minimal distance to the closest solutions coincides with the typical distance between solution $d \simeq 0.218$.
  • ...and 4 more figures