Table of Contents
Fetching ...

Injectivity capacity of ReLU gates

Mihailo Stojnic

TL;DR

Determining the ReLU injectivity capacity (ratio of the number of layer's inputs and outputs) is established as isomorphic to determining the capacity of the so-called spherical perceptron, which is of incredible importance in handling all the required numerical work.

Abstract

We consider the injectivity property of the ReLU networks layers. Determining the ReLU injectivity capacity (ratio of the number of layer's inputs and outputs) is established as isomorphic to determining the capacity of the so-called $\ell_0$ spherical perceptron. Employing \emph{fully lifted random duality theory} (fl RDT) a powerful program is developed and utilized to handle the $\ell_0$ spherical perceptron and implicitly the ReLU layers injectivity. To put the entire fl RDT machinery in practical use, a sizeable set of numerical evaluations is conducted as well. The lifting mechanism is observed to converge remarkably fast with relative corrections in the estimated quantities not exceeding $\sim 0.1\%$ already on the third level of lifting. Closed form explicit analytical relations among key lifting parameters are uncovered as well. In addition to being of incredible importance in handling all the required numerical work, these relations also shed a new light on beautiful parametric interconnections within the lifting structure. Finally, the obtained results are also shown to fairly closely match the replica predictions from [40].

Injectivity capacity of ReLU gates

TL;DR

Determining the ReLU injectivity capacity (ratio of the number of layer's inputs and outputs) is established as isomorphic to determining the capacity of the so-called spherical perceptron, which is of incredible importance in handling all the required numerical work.

Abstract

We consider the injectivity property of the ReLU networks layers. Determining the ReLU injectivity capacity (ratio of the number of layer's inputs and outputs) is established as isomorphic to determining the capacity of the so-called spherical perceptron. Employing \emph{fully lifted random duality theory} (fl RDT) a powerful program is developed and utilized to handle the spherical perceptron and implicitly the ReLU layers injectivity. To put the entire fl RDT machinery in practical use, a sizeable set of numerical evaluations is conducted as well. The lifting mechanism is observed to converge remarkably fast with relative corrections in the estimated quantities not exceeding already on the third level of lifting. Closed form explicit analytical relations among key lifting parameters are uncovered as well. In addition to being of incredible importance in handling all the required numerical work, these relations also shed a new light on beautiful parametric interconnections within the lifting structure. Finally, the obtained results are also shown to fairly closely match the replica predictions from [40].

Paper Structure

This paper contains 26 sections, 5 theorems, 160 equations, 4 tables.

Key Result

Theorem 1

Stojnicflrdt23 Consider large $n$ linear regime with $\alpha\triangleq \lim_{n\rightarrow\infty} \frac{m}{n}$, remaining constant as $n$ grows. Let ${\mathcal{X}}\subseteq {\mathbb R}^{n}$ and ${\mathcal{Y}}\subseteq {\mathbb R}^m$ be two given sets and let the elements of $A\in{\mathbb R}^{m\times Let $\hat{{\bf p}_0}\rightarrow 1$, $\hat{{\bf q}_0}\rightarrow 1$, and $\hat{{\bf c}_0}\rightarrow

Theorems & Definitions (12)

  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Theorem 2
  • proof
  • Corollary 2
  • proof
  • Corollary 3
  • proof
  • ...and 2 more