The Optimal Condition Number for ReLU Function

Yu Xia; Haoyu Zhou

The Optimal Condition Number for ReLU Function

Yu Xia, Haoyu Zhou

TL;DR

This work establishes fundamental limits and optimality results for the stability of the ReLU map in single neural network layers. It proves a universal lower bound $β_{A,b}\ge\sqrt{2}$ and shows that Gaussian random weights with zero bias asymptotically attain this bound, implying distance-preserving behavior in wide random networks. A general cone-based framework (Theorem Lipschitz_Result) provides explicit bi-Lipschitz bounds for Gaussian matrices with sample complexity depending on the Gaussian width $ω((S-S)\cap\mathbb{B}^n)$, improving prior ω⁴ dependencies to ω²; the analysis combines large- and small-distance regimes to bound $\frac{1}{m}\|σ(Ax)-σ(Ay)\|_2^2$ in terms of $\|x-y\|_2^2$ and an angular term φ(x,y). The results theoretically justify Gaussian initialization as distance-preserving and connect random-weight propagation to precise geometric and probabilistic tools, offering rigorous foundations for stable signal propagation in deep networks.

Abstract

ReLU is a widely used activation function in deep neural networks. This paper explores the stability properties of the ReLU map. For any weight matrix $\boldsymbol{A} \in \mathbb{R}^{m \times n}$ and bias vector $\boldsymbol{b} \in \mathbb{R}^{m}$ at a given layer, we define the condition number $β_{\boldsymbol{A},\boldsymbol{b}}$ as $β_{\boldsymbol{A},\boldsymbol{b}} = \frac{\mathcal{U}_{\boldsymbol{A},\boldsymbol{b}}}{\mathcal{L}_{\boldsymbol{A},\boldsymbol{b}}}$, where $\mathcal{U}_{\boldsymbol{A},\boldsymbol{b}}$ and $\mathcal{L}_{\boldsymbol{A},\boldsymbol{b}}$ are the upper and lower Lipschitz constants, respectively. We first demonstrate that for any given $\boldsymbol{A}$ and $\boldsymbol{b}$, the condition number satisfies $β_{\boldsymbol{A},\boldsymbol{b}} \geq \sqrt{2}$. Moreover, when the weights of the network at a given layer are initialized as random i.i.d. Gaussian variables and the bias term is set to zero, the condition number asymptotically approaches this lower bound. This theoretical finding suggests that Gaussian weight initialization is optimal for preserving distances in the context of random deep neural network weights.

The Optimal Condition Number for ReLU Function

TL;DR

This work establishes fundamental limits and optimality results for the stability of the ReLU map in single neural network layers. It proves a universal lower bound

and shows that Gaussian random weights with zero bias asymptotically attain this bound, implying distance-preserving behavior in wide random networks. A general cone-based framework (Theorem Lipschitz_Result) provides explicit bi-Lipschitz bounds for Gaussian matrices with sample complexity depending on the Gaussian width

, improving prior ω⁴ dependencies to ω²; the analysis combines large- and small-distance regimes to bound

in terms of

and an angular term φ(x,y). The results theoretically justify Gaussian initialization as distance-preserving and connect random-weight propagation to precise geometric and probabilistic tools, offering rigorous foundations for stable signal propagation in deep networks.

Abstract

ReLU is a widely used activation function in deep neural networks. This paper explores the stability properties of the ReLU map. For any weight matrix

and bias vector

at a given layer, we define the condition number

, where

and

are the upper and lower Lipschitz constants, respectively. We first demonstrate that for any given

and

, the condition number satisfies

. Moreover, when the weights of the network at a given layer are initialized as random i.i.d. Gaussian variables and the bias term is set to zero, the condition number asymptotically approaches this lower bound. This theoretical finding suggests that Gaussian weight initialization is optimal for preserving distances in the context of random deep neural network weights.

The Optimal Condition Number for ReLU Function

TL;DR

Abstract

The Optimal Condition Number for ReLU Function

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (34)