Deep ReLU networks -- injectivity capacity upper bounds

Mihailo Stojnic

Deep ReLU networks -- injectivity capacity upper bounds

Mihailo Stojnic

TL;DR

This work quantifies the injectivity capacity of deep ReLU networks by linking $l$-layer injectivity to $l$-extended $\ell_0$ spherical perceptrons and employing Random Duality Theory (RDT) to obtain probabilistic upper bounds on the capacity. The authors introduce weak and strong notions of injectivity, derive explicit bounds for 2-, 3-, and general-$l$-layer nets, and reveal an expansion-saturation effect where per-layer expansion approaches unity after a few layers (notably around 4 layers). To sharpen the bounds, they develop lifted (and partially lifted) RDT variants, which produce modest but meaningful reductions in the upper bounds, particularly in shallow networks. The results have potential implications for deep compressed sensing and network architecture design by informing the minimal necessary expansion across layers and guiding practical recovery guarantees. Overall, the approach provides a principled, quantitative framework for predicting injectivity behavior in deep ReLU models with Gaussian weights and offers a path toward tighter bounds through lifted-RDT techniques.

Abstract

We study deep ReLU feed forward neural networks (NN) and their injectivity abilities. The main focus is on \emph{precisely} determining the so-called injectivity capacity. For any given hidden layers architecture, it is defined as the minimal ratio between number of network's outputs and inputs which ensures unique recoverability of the input from a realizable output. A strong recent progress in precisely studying single ReLU layer injectivity properties is here moved to a deep network level. In particular, we develop a program that connects deep $l$-layer net injectivity to an $l$-extension of the $\ell_0$ spherical perceptrons, thereby massively generalizing an isomorphism between studying single layer injectivity and the capacity of the so-called (1-extension) $\ell_0$ spherical perceptrons discussed in [82]. \emph{Random duality theory} (RDT) based machinery is then created and utilized to statistically handle properties of the extended $\ell_0$ spherical perceptrons and implicitly of the deep ReLU NNs. A sizeable set of numerical evaluations is conducted as well to put the entire RDT machinery in practical use. From these we observe a rapidly decreasing tendency in needed layers' expansions, i.e., we observe a rapid \emph{expansion saturation effect}. Only $4$ layers of depth are sufficient to closely approach level of no needed expansion -- a result that fairly closely resembles observations made in practical experiments and that has so far remained completely untouchable by any of the existing mathematical methodologies.

Deep ReLU networks -- injectivity capacity upper bounds

TL;DR

This work quantifies the injectivity capacity of deep ReLU networks by linking

-layer injectivity to

-extended

spherical perceptrons and employing Random Duality Theory (RDT) to obtain probabilistic upper bounds on the capacity. The authors introduce weak and strong notions of injectivity, derive explicit bounds for 2-, 3-, and general-

-layer nets, and reveal an expansion-saturation effect where per-layer expansion approaches unity after a few layers (notably around 4 layers). To sharpen the bounds, they develop lifted (and partially lifted) RDT variants, which produce modest but meaningful reductions in the upper bounds, particularly in shallow networks. The results have potential implications for deep compressed sensing and network architecture design by informing the minimal necessary expansion across layers and guiding practical recovery guarantees. Overall, the approach provides a principled, quantitative framework for predicting injectivity behavior in deep ReLU models with Gaussian weights and offers a path toward tighter bounds through lifted-RDT techniques.

Abstract

-layer net injectivity to an

-extension of the

spherical perceptrons, thereby massively generalizing an isomorphism between studying single layer injectivity and the capacity of the so-called (1-extension)

spherical perceptrons discussed in [82]. \emph{Random duality theory} (RDT) based machinery is then created and utilized to statistically handle properties of the extended

spherical perceptrons and implicitly of the deep ReLU NNs. A sizeable set of numerical evaluations is conducted as well to put the entire RDT machinery in practical use. From these we observe a rapidly decreasing tendency in needed layers' expansions, i.e., we observe a rapid \emph{expansion saturation effect}. Only

layers of depth are sufficient to closely approach level of no needed expansion -- a result that fairly closely resembles observations made in practical experiments and that has so far remained completely untouchable by any of the existing mathematical methodologies.

Paper Structure (14 sections, 5 theorems, 96 equations, 8 tables)

This paper contains 14 sections, 5 theorems, 96 equations, 8 tables.

Introduction
Mathematical preliminaries, related work, and contributions
Related prior work
Our contributions
2-layer ReLU NN
Upper-bounding $\alpha_{ReLU}^{(inj)}(\alpha_1)$ via Random Duality Theory (RDT)
Multi-layer ReLU NN
3-layer ReLU NN
Upper-bounding $\alpha_{ReLU}^{(inj)}(\alpha_1,\alpha_2)$ via Random Duality Theory (RDT)
$l$-layer ReLU NN
Lifted RDT
Lowering upper bounds on $\alpha_{ReLU}^{(inj)}(\alpha_1)$ via pl RDT
Lifted RDT -- $l$-layer ReLU NN
Conclusion

Key Result

Lemma 1

Consider a sequence of positive integers, $n=m_0,m_1,m_2$, high-dimensional linear regime, corresponding expansion coefficients $\alpha_1,\alpha_2$, and assume that $\alpha_1$ is injectively admissible. Assume a 2-layer ReLU NN with architecture ${\mathcal{A}}_{1:2}=[A^{(1)},A^{(2)}]$ (the rows of m where and

Theorems & Definitions (10)

Lemma 1
proof
Theorem 1
proof
Lemma 2
proof
Theorem 2
proof
Theorem 3
proof

Deep ReLU networks -- injectivity capacity upper bounds

TL;DR

Abstract

Deep ReLU networks -- injectivity capacity upper bounds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (10)