Table of Contents
Fetching ...

Mathematical Foundations of Deep Learning

Xiaojing Ye

Abstract

This draft book offers a comprehensive and rigorous treatment of the mathematical principles underlying modern deep learning. The book spans core theoretical topics, from the approximation capabilities of deep neural networks, the theory and algorithms of optimal control and reinforcement learning integrated with deep learning techniques, to contemporary generative models that drive today's advances in artificial intelligence.

Mathematical Foundations of Deep Learning

Abstract

This draft book offers a comprehensive and rigorous treatment of the mathematical principles underlying modern deep learning. The book spans core theoretical topics, from the approximation capabilities of deep neural networks, the theory and algorithms of optimal control and reinforcement learning integrated with deep learning techniques, to contemporary generative models that drive today's advances in artificial intelligence.
Paper Structure (98 sections, 62 theorems, 1042 equations, 25 figures, 9 algorithms)

This paper contains 98 sections, 62 theorems, 1042 equations, 25 figures, 9 algorithms.

Key Result

Proposition 1.4.1

For any $\epsilon>0$, the function $f(x) = x^{2}$ defined on $[0,1]$ can be approximated with error $\epsilon> 0$ by a ReLU net having $O(\log(1/\epsilon))$ weights.

Figures (25)

  • Figure 1: A shallow neural network $f_{\theta}$ with input layer width (dimension) $d=3$, hidden layer width $d_{1} = 4$, and output layer width $1$.
  • Figure 2: An example of deep neural network $f_{\theta}$, called multilayer perceptron, with an input layer with width $d=3$, two hidden layers with widths $d_{1}=4$ and $d_{2}=5$, respectively, and an output layer with width $1$.
  • Figure 3: A ReLU feed-forward network $f_{\theta}$ with $3$ input neurons $x_{1}$, $x_{2}$, and $x_{3}$. All the other neurons are computation units. The total number of weights and biases of this network is 24, which is the number of computation units (8 in this example, i.e., $h_{1},\dots,h_{7},f_{\theta}$) plus the number of edges (16 in this example).
  • Figure 4: From left to right: $g_{1}$, $g_{2}$, and $g_{3}$ defined in \ref{['eq:gs']}.
  • Figure 5: The ReLU net $f_{m}$ defined in \ref{['eq:fm-approx-xsquare']} for $m=3$.
  • ...and 20 more figures

Theorems & Definitions (169)

  • Example 1.1.1: Linear regression
  • Example 1.1.2: Logistic classification
  • Example 1.2.1: Shallow neural network
  • Example 1.2.2: Multilayer perceptron
  • Proposition 1.4.1
  • proof
  • Proposition 1.4.2
  • proof
  • Proposition 1.4.3
  • proof
  • ...and 159 more