Table of Contents
Fetching ...

Mathematics of Neural Networks (Lecture Notes Graduate Course)

Bart M. N. Smets

TL;DR

These notes address the mathematical foundations of neural networks from a graduate-math perspective, formalizing supervised learning via $\hat{R}(w)=\frac{1}{N}\sum_{i=1}^N \ell(F(x_i;w),y_i)$ and exploring how regularization, initialization, and architectures influence generalization. They build from basic feed-forward models to deep networks, discuss training with SGD and momentum, and detail CNNs, backpropagation, and adaptive optimizers such as Adagrad, RMSProp, and Adam. A central contribution is the development of an equivariant framework based on Lie groups and homogeneous spaces to design rotation-translation equivariant CNNs via lifting, group convolutions, and projections, with explicit treatment of Haar measures and invariant integrals. The material connects classical geometry with modern architectures, offering a rigorous path to geometry-aware neural networks applicable in vision, physics-informed modeling, and beyond.

Abstract

These are the lecture notes that accompanied the course of the same name that I taught at the Eindhoven University of Technology from 2021 to 2023. The course is intended as an introduction to neural networks for mathematics students at the graduate level and aims to make mathematics students interested in further researching neural networks. It consists of two parts: first a general introduction to deep learning that focuses on introducing the field in a formal mathematical way. The second part provides an introduction to the theory of Lie groups and homogeneous spaces and how it can be applied to design neural networks with desirable geometric equivariances. The lecture notes were made to be as self-contained as possible so as to accessible for any student with a moderate mathematics background. The course also included coding tutorials and assignments in the form of a set of Jupyter notebooks that are publicly available at https://gitlab.com/bsmetsjr/mathematics_of_neural_networks.

Mathematics of Neural Networks (Lecture Notes Graduate Course)

TL;DR

These notes address the mathematical foundations of neural networks from a graduate-math perspective, formalizing supervised learning via and exploring how regularization, initialization, and architectures influence generalization. They build from basic feed-forward models to deep networks, discuss training with SGD and momentum, and detail CNNs, backpropagation, and adaptive optimizers such as Adagrad, RMSProp, and Adam. A central contribution is the development of an equivariant framework based on Lie groups and homogeneous spaces to design rotation-translation equivariant CNNs via lifting, group convolutions, and projections, with explicit treatment of Haar measures and invariant integrals. The material connects classical geometry with modern architectures, offering a rigorous path to geometry-aware neural networks applicable in vision, physics-informed modeling, and beyond.

Abstract

These are the lecture notes that accompanied the course of the same name that I taught at the Eindhoven University of Technology from 2021 to 2023. The course is intended as an introduction to neural networks for mathematics students at the graduate level and aims to make mathematics students interested in further researching neural networks. It consists of two parts: first a general introduction to deep learning that focuses on introducing the field in a formal mathematical way. The second part provides an introduction to the theory of Lie groups and homogeneous spaces and how it can be applied to design neural networks with desirable geometric equivariances. The lecture notes were made to be as self-contained as possible so as to accessible for any student with a moderate mathematics background. The course also included coding tutorials and assignments in the form of a set of Jupyter notebooks that are publicly available at https://gitlab.com/bsmetsjr/mathematics_of_neural_networks.
Paper Structure (54 sections, 11 theorems, 226 equations, 23 figures)

This paper contains 54 sections, 11 theorems, 226 equations, 23 figures.

Key Result

Lemma 1.6

Let $f \in C([0,1],\mathbb{R})$ then for all $\varepsilon > 0$ there exists a piecewise linear function $F$ so that

Figures (23)

  • Figure 1: A simplified biological neuron. The dendrites on the left receive electric signals from other neurons, once a certain threshold is reached the neuron will fire a signal along its axon and through its synapses on the right relay a signal to other neurons.
  • Figure 2: Some common scalar activation functions. From left to right: the rectified linear unit, the logistic sigmoid, the hyperbolic tangent and the swish function with $\beta=1$.
  • Figure 3: Diagrammatic representation of a shallow $\mathbb{R} \to \mathbb{R}$ neural network per \ref{['eq:scalarshallownetwork']}. In deep learning literature the input and output of a network are often referred to as the input unit respectively the output unit. The intermediate values are often called the hidden units. What is commonly referred to as the width and depth of the network is also indicated.
  • Figure 4: Example of a piecewise linear function on $[0,1]$ with $4$ pieces. The location of the inflection points are labeled with $\beta$'s and the slope of each piece is denoted with an $\alpha$.
  • Figure 5: Typical progression of the training and testing loss. The training loss will generally converge to some very low value. The testing loss either behaves in a similar fashion and will converge on some higher value, as is illustrated on the left. The testing loss could also start to increase again at some point, as on the right, this indicates overfitting and tells you when to stop training.
  • ...and 18 more figures

Theorems & Definitions (98)

  • Example 1.1
  • Example 1.2: Linear least squares
  • Example 1.3: Tikhonov regularization
  • Remark 1.4: Statistical learning theory viewpoint
  • Example 1.5: Boolean gate
  • Lemma 1.6
  • proof
  • Corollary 1.7
  • Remark 1.8: Higher order methods
  • Remark 1.9: Hyperparameters
  • ...and 88 more