Table of Contents
Fetching ...

On the growth of the parameters of approximating ReLU neural networks

Erion Morina, Martin Holler

TL;DR

The paper investigates how the number of parameters required to realize neural networks grows as they approximate smooth functions, rather than how approximation error scales with architecture. It proves that deep ReLU networks achieving near-optimal approximation rates exhibit polynomial, not exponential, growth in parameters, providing explicit constructions and bounds; a contrasting negative result shows shallow networks can incur exponential parameter growth for certain activations. The work situates its results relative to the literature, highlighting that ReQU-based networks can yield uniformly bounded parameters, while ReLU-based growth is favorable in high dimensions. The findings have implications for error analysis and training stability, and they underscore the practical advantage of deeper architectures in controlling parameter magnitudes during approximation.

Abstract

This work focuses on the analysis of fully connected feed forward ReLU neural networks as they approximate a given, smooth function. In contrast to conventionally studied universal approximation properties under increasing architectures, e.g., in terms of width or depth of the networks, we are concerned with the asymptotic growth of the parameters of approximating networks. Such results are of interest, e.g., for error analysis or consistency results for neural network training. The main result of our work is that, for a ReLU architecture with state of the art approximation error, the realizing parameters grow at most polynomially. The obtained rate with respect to a normalized network size is compared to existing results and is shown to be superior in most cases, in particular for high dimensional input.

On the growth of the parameters of approximating ReLU neural networks

TL;DR

The paper investigates how the number of parameters required to realize neural networks grows as they approximate smooth functions, rather than how approximation error scales with architecture. It proves that deep ReLU networks achieving near-optimal approximation rates exhibit polynomial, not exponential, growth in parameters, providing explicit constructions and bounds; a contrasting negative result shows shallow networks can incur exponential parameter growth for certain activations. The work situates its results relative to the literature, highlighting that ReQU-based networks can yield uniformly bounded parameters, while ReLU-based growth is favorable in high dimensions. The findings have implications for error analysis and training stability, and they underscore the practical advantage of deeper architectures in controlling parameter magnitudes during approximation.

Abstract

This work focuses on the analysis of fully connected feed forward ReLU neural networks as they approximate a given, smooth function. In contrast to conventionally studied universal approximation properties under increasing architectures, e.g., in terms of width or depth of the networks, we are concerned with the asymptotic growth of the parameters of approximating networks. Such results are of interest, e.g., for error analysis or consistency results for neural network training. The main result of our work is that, for a ReLU architecture with state of the art approximation error, the realizing parameters grow at most polynomially. The obtained rate with respect to a normalized network size is compared to existing results and is shown to be superior in most cases, in particular for high dimensional input.
Paper Structure (22 sections, 14 theorems, 150 equations, 2 figures, 2 tables)

This paper contains 22 sections, 14 theorems, 150 equations, 2 figures, 2 tables.

Key Result

Theorem 1

Let $d,q\in \mathbb{N}$ and $f\in \mathcal{C}^q([0,1]^d)$. Then for any $N,L\in \mathbb{N}$ there exists a ReLU feed forward neural network $f_{N,L}$ with width of order $N\log N$ and depth of order $L^2\log L$ such that The parameters of the $f_{N,L}$ grow asymptotically as $\mathcal{O}(\max(N^{(6q-3)/d}L^{(6q-2)/d}, N^2L^3))$.

Figures (2)

  • Figure 1: Overview of structure of results in lu_main
  • Figure 2: The approximated function $f$ in \ref{['special_f']} and the modification $f^*$ in \ref{['special_f_mod']}.

Theorems & Definitions (25)

  • Theorem : Simplification of Theorem \ref{['theorem_lu_param_growth']}
  • Definition 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Lemma 5
  • proof
  • Lemma 6
  • proof
  • Lemma 7
  • ...and 15 more