Mathematical Foundations of Deep Learning

Xiaojing Ye

Mathematical Foundations of Deep Learning

Xiaojing Ye

Abstract

This draft book offers a comprehensive and rigorous treatment of the mathematical principles underlying modern deep learning. The book spans core theoretical topics, from the approximation capabilities of deep neural networks, the theory and algorithms of optimal control and reinforcement learning integrated with deep learning techniques, to contemporary generative models that drive today's advances in artificial intelligence.

Mathematical Foundations of Deep Learning

Abstract

Paper Structure (98 sections, 62 theorems, 1042 equations, 25 figures, 9 algorithms)

This paper contains 98 sections, 62 theorems, 1042 equations, 25 figures, 9 algorithms.

Deep Neural Networks
Function Approximations
Shallow and Deep Neural Networks
Universal Approximation Theorem
Feed-forward network and its size
Lebesgue and Sobolev spaces and their norms
Proof of Universal Approximation Theorem
Network Architecture Design
Examples of Activation Functions
Sigmoid
Hypertangent (tanh)
Rectified Linear Unit (ReLU)
Exponential Linear Unit (ELU)
Continuously Differentiable Exponential Linear Unit (CELU)
Gaussian Error Linear Unit (GELU)
...and 83 more sections

Key Result

Proposition 1.4.1

For any $\epsilon>0$, the function $f(x) = x^{2}$ defined on $[0,1]$ can be approximated with error $\epsilon> 0$ by a ReLU net having $O(\log(1/\epsilon))$ weights.

Figures (25)

Figure 1: A shallow neural network $f_{\theta}$ with input layer width (dimension) $d=3$, hidden layer width $d_{1} = 4$, and output layer width $1$.
Figure 2: An example of deep neural network $f_{\theta}$, called multilayer perceptron, with an input layer with width $d=3$, two hidden layers with widths $d_{1}=4$ and $d_{2}=5$, respectively, and an output layer with width $1$.
Figure 3: A ReLU feed-forward network $f_{\theta}$ with $3$ input neurons $x_{1}$, $x_{2}$, and $x_{3}$. All the other neurons are computation units. The total number of weights and biases of this network is 24, which is the number of computation units (8 in this example, i.e., $h_{1},\dots,h_{7},f_{\theta}$) plus the number of edges (16 in this example).
Figure 4: From left to right: $g_{1}$, $g_{2}$, and $g_{3}$ defined in \ref{['eq:gs']}.
Figure 5: The ReLU net $f_{m}$ defined in \ref{['eq:fm-approx-xsquare']} for $m=3$.
...and 20 more figures

Theorems & Definitions (169)

Example 1.1.1: Linear regression
Example 1.1.2: Logistic classification
Example 1.2.1: Shallow neural network
Example 1.2.2: Multilayer perceptron
Proposition 1.4.1
proof
Proposition 1.4.2
proof
Proposition 1.4.3
proof
...and 159 more

Mathematical Foundations of Deep Learning

Abstract

Mathematical Foundations of Deep Learning

Authors

Abstract

Table of Contents

Key Result

Figures (25)

Theorems & Definitions (169)