Injectivity of ReLU networks: perspectives from statistical physics

Antoine Maillard; Afonso S. Bandeira; David Belius; Ivan Dokmanić; Shuta Nakajima

Injectivity of ReLU networks: perspectives from statistical physics

Antoine Maillard, Afonso S. Bandeira, David Belius, Ivan Dokmanić, Shuta Nakajima

TL;DR

The paper tackles the injectivity of a random ReLU layer in the high-dimensional regime by reframing the problem as a ground-state question for the spherical perceptron and coupling it to a statistical-physics formalism. It contrasts the average Euler-characteristic approach (predicting a threshold near $\alpha_{inj}^{Euler}\approx 8.34$) with a replica-method analysis that reveals replica-symmetric and full replica-symmetry-breaking structures, yielding a provable RS upper bound $\alpha_{inj}^{RS}\approx 7.65$ that refutes the Euler prediction and a tighter full-RSB threshold around $\alpha_{inj}^{FRSB}\approx 6.698$. The work provides a Parisi-formula framework for the free entropy and derives an algorithmic procedure to solve the zero-temperature FRSB equations, producing quantitative thresholds and highlighting discrepancies with random-geometry predictions. The results bridge spin-glass theory and random-geometric analyses of neural networks, and open questions remain about the origin of the discrepancy and extensions to multi-layer networks. Overall, the paper delivers a rigorous RS bound, a detailed FRSB computation, and a coherent probabilistic-physical narrative linking injectivity to the ground-state geometry of a disordered system, with a public numerical implementation for reproducibility.

Abstract

When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer, $x \mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in a high-dimensional setting where $n, m \to \infty$. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity threshold for $α= \frac{m}{n}$ by studying the expected Euler characteristic of a certain random set. We adopt a different perspective and show that injectivity is equivalent to a property of the ground state of the spherical perceptron, an important spin glass model in statistical physics. By leveraging the (non-rigorous) replica symmetry-breaking theory, we derive analytical equations for the threshold whose solution is at odds with that from the Euler characteristic. Furthermore, we use Gordon's min--max theorem to prove that a replica-symmetric upper bound refutes the Euler characteristic prediction. Along the way we aim to give a tutorial-style introduction to key ideas from statistical physics in an effort to make the exposition accessible to a broad audience. Our analysis establishes a connection between spin glasses and integral geometry but leaves open the problem of explaining the discrepancies.

Injectivity of ReLU networks: perspectives from statistical physics

TL;DR

) with a replica-method analysis that reveals replica-symmetric and full replica-symmetry-breaking structures, yielding a provable RS upper bound

that refutes the Euler prediction and a tighter full-RSB threshold around

. The work provides a Parisi-formula framework for the free entropy and derives an algorithmic procedure to solve the zero-temperature FRSB equations, producing quantitative thresholds and highlighting discrepancies with random-geometry predictions. The results bridge spin-glass theory and random-geometric analyses of neural networks, and open questions remain about the origin of the discrepancy and extensions to multi-layer networks. Overall, the paper delivers a rigorous RS bound, a detailed FRSB computation, and a coherent probabilistic-physical narrative linking injectivity to the ground-state geometry of a disordered system, with a public numerical implementation for reproducibility.

Abstract

When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer,

, with a random Gaussian

matrix

, in a high-dimensional setting where

. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity threshold for

by studying the expected Euler characteristic of a certain random set. We adopt a different perspective and show that injectivity is equivalent to a property of the ground state of the spherical perceptron, an important spin glass model in statistical physics. By leveraging the (non-rigorous) replica symmetry-breaking theory, we derive analytical equations for the threshold whose solution is at odds with that from the Euler characteristic. Furthermore, we use Gordon's min--max theorem to prove that a replica-symmetric upper bound refutes the Euler characteristic prediction. Along the way we aim to give a tutorial-style introduction to key ideas from statistical physics in an effort to make the exposition accessible to a broad audience. Our analysis establishes a connection between spin glasses and integral geometry but leaves open the problem of explaining the discrepancies.

Paper Structure (76 sections, 17 theorems, 257 equations, 6 figures)

This paper contains 76 sections, 17 theorems, 257 equations, 6 figures.

Introduction
Injectivity and (random) neural networks
Injectivity and random geometry
Notation --
Statistical physics and the spherical perceptron
Injectivity as energy minimization --
Statistical physics of disordered systems --
Cover's theorem and the bound $\alpha_\mathrm{inj} \geq 3$ --
Thermal relaxation: the Gibbs--Boltzmann distribution --
Universality of the free entropy --
Related work
Average Euler characteristic prediction --
Physics and mathematics of the perceptron --
Other related work --
Main results
...and 61 more sections

Key Result

Proposition 1.1

The probability $p_{m,n}$ that $\varphi_\bW$ is injective is where $V$ is a uniformly random $n$-dimensional subspace of $\mathbb{R}^m$, and $C_{m,n}$ is the set of vectors in $\mathbb{R}^m$ with strictly less than $n$ strictly positive coordinates.

Figures (6)

Figure 1: $T=0$ limit of the RS, 1RSB and FRSB solutions, as a function of $\alpha$. We compare the predictions for the ground state energy $f^\star(\alpha) = \lim_{\beta \to \infty} [-\Phi(\alpha,\beta)/\beta]$ and the zero-temperature susceptibility $\chi$ (see Sections \ref{['sec:upper_bounds']} and \ref{['sec:full_rsb']}). The green area is forbidden for $\alpha_\mathrm{inj}$ by the replica-symmetric lower bound of eq. \ref{['eq:lower_bound_RS_stability']}.
Figure 2: The function $\Phi_{\rm RS}(r, q) - \Phi_{\rm RS}(r, q^\star(r))$ as a function of $q \in [0,1]$, for different values of $r$ close to $1$, and $q^\star(r)$ the unique solution to $\partial_q \Phi_{\rm RS}(r, q) = 0$. We observe that $q^\star(r)$ is a global maximum for $r > 1$, and becomes a global minimum for $r < 1$. Here $\alpha = 5$ and $\beta = 1$.
Figure 3: Illustration of the finite-RSB and full-RSB structure in the functions $\rho(q)$ (right) and $q(x)$ (left). We use the convention $m_{-1} = 0$. In terms of the overlap distribution, the $k$-RSB ansatz (in blue) corresponds to $\rho(q) = \sum_{i=0}^k (m_{i} - m_{i-1}) \delta(q - q_i)$, with the convention $m_{-1} = 0$. In orange, the full-RSB distribution is $\rho(q) = \int_0^1 \delta(q-q(x)) \, \mathrm{d} x$, and is assumed to have two delta peaks at the edges of its support $q \in \{q_m, q_M\}$, with masses $\{x_m,(1-x_M)\}$ (see the equation on the right figure). $x(q)$ is the functional inverse of $q(x)$.
Figure 4: $T=0$ limit quantities of the RS, 1RSB and FRSB solutions, as a function of $\alpha$. We compare the predictions for the different forms of the function $q(x)$ corresponding to the assumed level of replica symmetry breaking.
Figure 5: Computation of $\alpha_\mathrm{inj}^{\rm FRSB}$ using the FRSB algorithmic procedure. For different values of $x_\mathrm{max}$ and $k$ we give an interval numerically found to contain $\alpha_\mathrm{inj}^{\rm FRSB}$. In Result \ref{['result:frsb_result']} we took the interval of values obtained with $k = 200$ and $x_\mathrm{max} = 15$. We give more details on the numerical procedure in Appendix \ref{['subsec_app:numerics_frsb_threshold']}.
...and 1 more figures

Theorems & Definitions (19)

Proposition 1.1: Injectivity and random geometry
Lemma 1.2: Cover's lower bound for injectivity
Theorem 1.3: Known bounds for injectivity puthawala2022globallypaleka2021injectivityclum2022topics
Theorem 1.4: Free entropy concentration
Corollary 1.5: Sufficient condition for non-injectivity
Conjecture 1.6: Tightness of the free entropy bound
Conjecture 1.7: Parisi formula
Theorem 1.8: Replica-symmetric upper bound for the injectivity threshold
Lemma A.1
Lemma A.2
...and 9 more

Injectivity of ReLU networks: perspectives from statistical physics

TL;DR

Abstract

Injectivity of ReLU networks: perspectives from statistical physics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (19)