Approximation Power of Deep Neural Networks: an explanatory mathematical survey

Owen Davis; Mohammad Motamed

Approximation Power of Deep Neural Networks: an explanatory mathematical survey

Owen Davis, Mohammad Motamed

TL;DR

This paper surveys the approximation capabilities of deep neural networks, focusing on the expressive power of feed-forward and residual architectures and their formulation as optimization problems. It synthesizes classical density results (e.g., Weierstrass and Pinkus) with modern depth-based theories, showing that deep ReLU networks and deep Fourier networks achieve favorable error–complexity trade-offs, including exponential convergence for certain self-similar targets. The work also provides concrete error estimates for Fourier and ReLU networks, connects network width and depth to approximation quality, and illustrates both theoretical and numerical insights through structured examples. Overall, it establishes a rigorous mathematical foundation for understanding when and why deep networks can outperform traditional approximation methods on bounded, potentially irregular targets, while outlining key open questions such as dimensionality effects and spectral bias.

Abstract

This survey provides an in-depth and explanatory review of the approximation properties of deep neural networks, with a focus on feed-forward and residual architectures. The primary objective is to examine how effectively neural networks approximate target functions and to identify conditions under which they outperform traditional approximation methods. Key topics include the nonlinear, compositional structure of deep networks and the formalization of neural network tasks as optimization problems in regression and classification settings. The survey also addresses the training process, emphasizing the role of stochastic gradient descent and backpropagation in solving these optimization problems, and highlights practical considerations such as activation functions, overfitting, and regularization techniques. Additionally, the survey explores the density of neural networks in the space of continuous functions, comparing the approximation capabilities of deep ReLU networks with those of other approximation methods. It discusses recent theoretical advancements in understanding the expressiveness and limitations of these networks. A detailed error-complexity analysis is also presented, focusing on error rates and computational complexity for neural networks with ReLU and Fourier-type activation functions in the context of bounded target functions with minimal regularity assumptions. Alongside recent known results, the survey introduces new findings, offering a valuable resource for understanding the theoretical foundations of neural network approximation. Concluding remarks and further reading suggestions are provided.

Approximation Power of Deep Neural Networks: an explanatory mathematical survey

TL;DR

Abstract

Paper Structure (54 sections, 20 theorems, 191 equations, 24 figures, 1 algorithm)

This paper contains 54 sections, 20 theorems, 191 equations, 24 figures, 1 algorithm.

Neural networks: formalization and key concepts
Feed-forward neural networks
Applications of neural networks
Network training as an optimization problem
Solving the optimization problem
Gradient descent approach
Backpropagation
Mini-batch stochastic GD and backpropagation
Activation functions
Loss functions
Overfitting and regularization
Adding penalty terms
Early stopping
Dropout
Validation and hyper-parameter tuning
...and 39 more sections

Key Result

Proposition 1

For any $f_{{\Phi}_j} \in {\mathcal{N}}_{W, L_j}$ with $j=1 , \dotsc, J$, the following holds:

Figures (24)

Figure 1: Graph representation of a feed-forward network with $L=7$ hidden layers, and two input and one output neuron. The number of neurons in hidden layers varies between 2 and 4.
Figure 2: The loss on training data versus test data as the number of epochs increases.
Figure 3: An example of a ridge function $g({\bf w} \cdot {\bf x}) = \sin {\bf w} \cdot {\bf x}$ in two dimensions, where ${\bf x} = (x_1, x_2) \in {\mathbb R}^2$, and for three different direction vectors ${\bf w} = (0,1)$ (left), ${\bf w} = (1,0)$ (middle), and ${\bf w} = (1/\sqrt{2},1/\sqrt{2})$ (right).
Figure 4: Graph representation of a special ReLU network in $\hat{\mathcal{N}}_{4,3}$; network weights are indicated adjacent to their corresponding edge and biases internal to their corresponding neuron.
Figure 5: By adding source and collation channels we can concatenate two standard ReLU networks $\Phi_1$ (in blue) and $\Phi_2$ (in red) and generate a special ReLU network that outputs $f_{\Phi_1} + f_{\Phi_2}$.
...and 19 more figures

Theorems & Definitions (34)

Remark 1
Proposition 1
proof
Proposition 2
proof
Proposition 3
proof
Lemma 1
Lemma 2
Lemma 3
...and 24 more

Approximation Power of Deep Neural Networks: an explanatory mathematical survey

TL;DR

Abstract

Approximation Power of Deep Neural Networks: an explanatory mathematical survey

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (24)

Theorems & Definitions (34)