Mathematics of Neural Networks (Lecture Notes Graduate Course)

Bart M. N. Smets

Mathematics of Neural Networks (Lecture Notes Graduate Course)

Bart M. N. Smets

TL;DR

These notes address the mathematical foundations of neural networks from a graduate-math perspective, formalizing supervised learning via $\hat{R}(w)=\frac{1}{N}\sum_{i=1}^N \ell(F(x_i;w),y_i)$ and exploring how regularization, initialization, and architectures influence generalization. They build from basic feed-forward models to deep networks, discuss training with SGD and momentum, and detail CNNs, backpropagation, and adaptive optimizers such as Adagrad, RMSProp, and Adam. A central contribution is the development of an equivariant framework based on Lie groups and homogeneous spaces to design rotation-translation equivariant CNNs via lifting, group convolutions, and projections, with explicit treatment of Haar measures and invariant integrals. The material connects classical geometry with modern architectures, offering a rigorous path to geometry-aware neural networks applicable in vision, physics-informed modeling, and beyond.

Abstract

These are the lecture notes that accompanied the course of the same name that I taught at the Eindhoven University of Technology from 2021 to 2023. The course is intended as an introduction to neural networks for mathematics students at the graduate level and aims to make mathematics students interested in further researching neural networks. It consists of two parts: first a general introduction to deep learning that focuses on introducing the field in a formal mathematical way. The second part provides an introduction to the theory of Lie groups and homogeneous spaces and how it can be applied to design neural networks with desirable geometric equivariances. The lecture notes were made to be as self-contained as possible so as to accessible for any student with a moderate mathematics background. The course also included coding tutorials and assignments in the form of a set of Jupyter notebooks that are publicly available at https://gitlab.com/bsmetsjr/mathematics_of_neural_networks.

Mathematics of Neural Networks (Lecture Notes Graduate Course)

TL;DR

These notes address the mathematical foundations of neural networks from a graduate-math perspective, formalizing supervised learning via

and exploring how regularization, initialization, and architectures influence generalization. They build from basic feed-forward models to deep networks, discuss training with SGD and momentum, and detail CNNs, backpropagation, and adaptive optimizers such as Adagrad, RMSProp, and Adam. A central contribution is the development of an equivariant framework based on Lie groups and homogeneous spaces to design rotation-translation equivariant CNNs via lifting, group convolutions, and projections, with explicit treatment of Haar measures and invariant integrals. The material connects classical geometry with modern architectures, offering a rigorous path to geometry-aware neural networks applicable in vision, physics-informed modeling, and beyond.

Abstract

Paper Structure (54 sections, 11 theorems, 226 equations, 23 figures)

This paper contains 54 sections, 11 theorems, 226 equations, 23 figures.

The Basics
Supervised Learning
The supervised learning problem
Regression & classification
Artificial Neurons & Activation Functions
Shallow Networks
Stochastic Gradient Descent
Training
Deep Learning
Deep Neural Networks
Feed Forward Networks
Vanishing and Exploding Gradients
Scaling to High Dimensional Data
Initialization
Stochastic Initialization
...and 39 more sections

Key Result

Lemma 1.6

Let $f \in C([0,1],\mathbb{R})$ then for all $\varepsilon > 0$ there exists a piecewise linear function $F$ so that

Figures (23)

Figure 1: A simplified biological neuron. The dendrites on the left receive electric signals from other neurons, once a certain threshold is reached the neuron will fire a signal along its axon and through its synapses on the right relay a signal to other neurons.
Figure 2: Some common scalar activation functions. From left to right: the rectified linear unit, the logistic sigmoid, the hyperbolic tangent and the swish function with $\beta=1$.
Figure 3: Diagrammatic representation of a shallow $\mathbb{R} \to \mathbb{R}$ neural network per \ref{['eq:scalarshallownetwork']}. In deep learning literature the input and output of a network are often referred to as the input unit respectively the output unit. The intermediate values are often called the hidden units. What is commonly referred to as the width and depth of the network is also indicated.
Figure 4: Example of a piecewise linear function on $[0,1]$ with $4$ pieces. The location of the inflection points are labeled with $\beta$'s and the slope of each piece is denoted with an $\alpha$.
Figure 5: Typical progression of the training and testing loss. The training loss will generally converge to some very low value. The testing loss either behaves in a similar fashion and will converge on some higher value, as is illustrated on the left. The testing loss could also start to increase again at some point, as on the right, this indicates overfitting and tells you when to stop training.
...and 18 more figures

Theorems & Definitions (98)

Example 1.1
Example 1.2: Linear least squares
Example 1.3: Tikhonov regularization
Remark 1.4: Statistical learning theory viewpoint
Example 1.5: Boolean gate
Lemma 1.6
proof
Corollary 1.7
Remark 1.8: Higher order methods
Remark 1.9: Hyperparameters
...and 88 more

Mathematics of Neural Networks (Lecture Notes Graduate Course)

TL;DR

Abstract

Mathematics of Neural Networks (Lecture Notes Graduate Course)

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (23)

Theorems & Definitions (98)