Minimum Width of Deep Narrow Networks for Universal Approximation

Xiao-Song Yang; Qi Zhou; Xuan Zhou

Minimum Width of Deep Narrow Networks for Universal Approximation

Xiao-Song Yang, Qi Zhou, Xuan Zhou

TL;DR

This work tackles the problem of determining the minimum width required for universal approximation by deep narrow networks, revealing how the bound $w_{min}$ scales with input and output dimensions and depends on the activation function. The authors deploy a blend of geometric and topological arguments, including a Poincaré-Miranda based approach, to derive lower bounds for injective activations and establish upper bounds for ELU/SELU and ReLU variants, with precise equalities in key dimension regimes. The main theoretical contributions show $w_{min} \leq \max(2d_x+1, d_y)$ for ELU/SELU (and $w_{min}=2d_x+1$ when $d_y=2d_x$) and $d_x+1 \leq w_{min} \leq d_x+d_y$ for LeakyReLU/ELU/CELU/SELU/Softplus, while injective activations yield $w_{min} \ge d_y+\mathbf{1}_{d_x<d_y\leq 2d_x}$. Complementary numerical experiments on rotations and DISK datasets validate the width–depth trade-offs and illustrate the practical implications for designing deep narrow networks. The results provide concrete design guidelines on how wide networks must be to guarantee universal approximation for different activation functions, informing both theoretical understanding and practical network architecture decisions in deep learning.

Abstract

Determining the minimum width of fully connected neural networks has become a fundamental problem in recent theoretical studies of deep neural networks. In this paper, we study the lower bounds and upper bounds of the minimum width required for fully connected neural networks in order to have universal approximation capability, which is important in network design and training. We show that $w_{min}\leq\max(2d_x+1, d_y)$ also holds true for networks with ELU, SELU activation functions, and the upper bound of this inequality is attained when $d_y=2d_x$, where $d_x$, $d_y$ denote the input and output dimensions, respectively. Besides, we show that $d_x+1\leq w_{min}\leq d_x+d_y$ for networks with LeakyReLU, ELU, CELU, SELU, Softplus activation functions, by proving that ReLU activation function can be approximated by these activation functions. In addition, in the case that the activation function is injective or can be uniformly approximated by a sequence of injective functions (e.g., ReLU), we present a new proof of the inequality $w_{min}\ge d_y+\mathbf{1}_{d_x<d_y\leq2d_x}$ by constructing a more intuitive example via a new geometric approach based on Poincaré-Miranda Theorem.

Minimum Width of Deep Narrow Networks for Universal Approximation

TL;DR

This work tackles the problem of determining the minimum width required for universal approximation by deep narrow networks, revealing how the bound

scales with input and output dimensions and depends on the activation function. The authors deploy a blend of geometric and topological arguments, including a Poincaré-Miranda based approach, to derive lower bounds for injective activations and establish upper bounds for ELU/SELU and ReLU variants, with precise equalities in key dimension regimes. The main theoretical contributions show

for ELU/SELU (and

when

) and

for LeakyReLU/ELU/CELU/SELU/Softplus, while injective activations yield

. Complementary numerical experiments on rotations and DISK datasets validate the width–depth trade-offs and illustrate the practical implications for designing deep narrow networks. The results provide concrete design guidelines on how wide networks must be to guarantee universal approximation for different activation functions, informing both theoretical understanding and practical network architecture decisions in deep learning.

Abstract

also holds true for networks with ELU, SELU activation functions, and the upper bound of this inequality is attained when

, where

denote the input and output dimensions, respectively. Besides, we show that

for networks with LeakyReLU, ELU, CELU, SELU, Softplus activation functions, by proving that ReLU activation function can be approximated by these activation functions. In addition, in the case that the activation function is injective or can be uniformly approximated by a sequence of injective functions (e.g., ReLU), we present a new proof of the inequality

by constructing a more intuitive example via a new geometric approach based on Poincaré-Miranda Theorem.

Minimum Width of Deep Narrow Networks for Universal Approximation

TL;DR

Abstract

Minimum Width of Deep Narrow Networks for Universal Approximation

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (22)