Injectivity of ReLU networks: perspectives from statistical physics
Antoine Maillard, Afonso S. Bandeira, David Belius, Ivan Dokmanić, Shuta Nakajima
TL;DR
The paper tackles the injectivity of a random ReLU layer in the high-dimensional regime by reframing the problem as a ground-state question for the spherical perceptron and coupling it to a statistical-physics formalism. It contrasts the average Euler-characteristic approach (predicting a threshold near $\alpha_{inj}^{Euler}\approx 8.34$) with a replica-method analysis that reveals replica-symmetric and full replica-symmetry-breaking structures, yielding a provable RS upper bound $\alpha_{inj}^{RS}\approx 7.65$ that refutes the Euler prediction and a tighter full-RSB threshold around $\alpha_{inj}^{FRSB}\approx 6.698$. The work provides a Parisi-formula framework for the free entropy and derives an algorithmic procedure to solve the zero-temperature FRSB equations, producing quantitative thresholds and highlighting discrepancies with random-geometry predictions. The results bridge spin-glass theory and random-geometric analyses of neural networks, and open questions remain about the origin of the discrepancy and extensions to multi-layer networks. Overall, the paper delivers a rigorous RS bound, a detailed FRSB computation, and a coherent probabilistic-physical narrative linking injectivity to the ground-state geometry of a disordered system, with a public numerical implementation for reproducibility.
Abstract
When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer, $x \mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in a high-dimensional setting where $n, m \to \infty$. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity threshold for $α= \frac{m}{n}$ by studying the expected Euler characteristic of a certain random set. We adopt a different perspective and show that injectivity is equivalent to a property of the ground state of the spherical perceptron, an important spin glass model in statistical physics. By leveraging the (non-rigorous) replica symmetry-breaking theory, we derive analytical equations for the threshold whose solution is at odds with that from the Euler characteristic. Furthermore, we use Gordon's min--max theorem to prove that a replica-symmetric upper bound refutes the Euler characteristic prediction. Along the way we aim to give a tutorial-style introduction to key ideas from statistical physics in an effort to make the exposition accessible to a broad audience. Our analysis establishes a connection between spin glasses and integral geometry but leaves open the problem of explaining the discrepancies.
