Deep ReLU networks -- injectivity capacity upper bounds
Mihailo Stojnic
TL;DR
This work quantifies the injectivity capacity of deep ReLU networks by linking $l$-layer injectivity to $l$-extended $\ell_0$ spherical perceptrons and employing Random Duality Theory (RDT) to obtain probabilistic upper bounds on the capacity. The authors introduce weak and strong notions of injectivity, derive explicit bounds for 2-, 3-, and general-$l$-layer nets, and reveal an expansion-saturation effect where per-layer expansion approaches unity after a few layers (notably around 4 layers). To sharpen the bounds, they develop lifted (and partially lifted) RDT variants, which produce modest but meaningful reductions in the upper bounds, particularly in shallow networks. The results have potential implications for deep compressed sensing and network architecture design by informing the minimal necessary expansion across layers and guiding practical recovery guarantees. Overall, the approach provides a principled, quantitative framework for predicting injectivity behavior in deep ReLU models with Gaussian weights and offers a path toward tighter bounds through lifted-RDT techniques.
Abstract
We study deep ReLU feed forward neural networks (NN) and their injectivity abilities. The main focus is on \emph{precisely} determining the so-called injectivity capacity. For any given hidden layers architecture, it is defined as the minimal ratio between number of network's outputs and inputs which ensures unique recoverability of the input from a realizable output. A strong recent progress in precisely studying single ReLU layer injectivity properties is here moved to a deep network level. In particular, we develop a program that connects deep $l$-layer net injectivity to an $l$-extension of the $\ell_0$ spherical perceptrons, thereby massively generalizing an isomorphism between studying single layer injectivity and the capacity of the so-called (1-extension) $\ell_0$ spherical perceptrons discussed in [82]. \emph{Random duality theory} (RDT) based machinery is then created and utilized to statistically handle properties of the extended $\ell_0$ spherical perceptrons and implicitly of the deep ReLU NNs. A sizeable set of numerical evaluations is conducted as well to put the entire RDT machinery in practical use. From these we observe a rapidly decreasing tendency in needed layers' expansions, i.e., we observe a rapid \emph{expansion saturation effect}. Only $4$ layers of depth are sufficient to closely approach level of no needed expansion -- a result that fairly closely resembles observations made in practical experiments and that has so far remained completely untouchable by any of the existing mathematical methodologies.
