Table of Contents
Fetching ...

Batch Matrix-form Equations and Implementation of Multilayer Perceptrons

Wieger Wesselink, Bram Grooten, Huub van de Wetering, Qiao Xiao, Decebal Constantin Mocanu

TL;DR

This work fills a gap in neural network literature by providing complete, explicit batch matrix-form derivations for forward and backward passes of MLPs, including advanced layers like batch normalization and softmax. It couples mathematical rigor with symbolic validation via SymPy and delivers uniform reference implementations across NumPy, PyTorch, JAX, TensorFlow, and a high-performance C++ backend optimized for sparsity. The key contributions are a full batch-form backpropagation derivation, symbolic gradient validation, and cross-framework implementations that illuminate the computational structure and enable efficient sparse computation. Together, these results offer a transparent, extensible foundation for teaching, research, and optimization of neural networks, particularly in sparse settings where explicit formulations reveal performance bottlenecks and guide targeted improvements.

Abstract

Multilayer perceptrons (MLPs) remain fundamental to modern deep learning, yet their algorithmic details are rarely presented in complete, explicit \emph{batch matrix-form}. Rather, most references express gradients per sample or rely on automatic differentiation. Although automatic differentiation can achieve equally high computational efficiency, the usage of batch matrix-form makes the computational structure explicit, which is essential for transparent, systematic analysis, and optimization in settings such as sparse neural networks. This paper fills that gap by providing a mathematically rigorous and implementation-ready specification of MLPs in batch matrix-form. We derive forward and backward equations for all standard and advanced layers, including batch normalization and softmax, and validate all equations using the symbolic mathematics library SymPy. From these specifications, we construct uniform reference implementations in NumPy, PyTorch, JAX, TensorFlow, and a high-performance C++ backend optimized for sparse operations. Our main contributions are: (1) a complete derivation of batch matrix-form backpropagation for MLPs, (2) symbolic validation of all gradient equations, (3) uniform Python and C++ reference implementations grounded in a small set of matrix primitives, and (4) demonstration of how explicit formulations enable efficient sparse computation. Together, these results establish a validated, extensible foundation for understanding, teaching, and researching neural network algorithms.

Batch Matrix-form Equations and Implementation of Multilayer Perceptrons

TL;DR

This work fills a gap in neural network literature by providing complete, explicit batch matrix-form derivations for forward and backward passes of MLPs, including advanced layers like batch normalization and softmax. It couples mathematical rigor with symbolic validation via SymPy and delivers uniform reference implementations across NumPy, PyTorch, JAX, TensorFlow, and a high-performance C++ backend optimized for sparsity. The key contributions are a full batch-form backpropagation derivation, symbolic gradient validation, and cross-framework implementations that illuminate the computational structure and enable efficient sparse computation. Together, these results offer a transparent, extensible foundation for teaching, research, and optimization of neural networks, particularly in sparse settings where explicit formulations reveal performance bottlenecks and guide targeted improvements.

Abstract

Multilayer perceptrons (MLPs) remain fundamental to modern deep learning, yet their algorithmic details are rarely presented in complete, explicit \emph{batch matrix-form}. Rather, most references express gradients per sample or rely on automatic differentiation. Although automatic differentiation can achieve equally high computational efficiency, the usage of batch matrix-form makes the computational structure explicit, which is essential for transparent, systematic analysis, and optimization in settings such as sparse neural networks. This paper fills that gap by providing a mathematically rigorous and implementation-ready specification of MLPs in batch matrix-form. We derive forward and backward equations for all standard and advanced layers, including batch normalization and softmax, and validate all equations using the symbolic mathematics library SymPy. From these specifications, we construct uniform reference implementations in NumPy, PyTorch, JAX, TensorFlow, and a high-performance C++ backend optimized for sparse operations. Our main contributions are: (1) a complete derivation of batch matrix-form backpropagation for MLPs, (2) symbolic validation of all gradient equations, (3) uniform Python and C++ reference implementations grounded in a small set of matrix primitives, and (4) demonstration of how explicit formulations enable efficient sparse computation. Together, these results establish a validated, extensible foundation for understanding, teaching, and researching neural network algorithms.

Paper Structure

This paper contains 42 sections, 2 theorems, 68 equations, 3 figures, 10 tables.

Key Result

Lemma 1

The product rule for scalar functions $u$ and $v$ is given by It can be generalized to vector functions, but the result is sensitive to the orientation of the operands. Below we give four concrete applications of the product rule for vector functions. Let $x \in \mathbb{R}^{p}$, $A \in \mathbb{R}^{m \times n}$, and $h(x) = f(x) g(x)$ for $m, n, p \in \mathbb{N

Figures (3)

  • Figure 1: Graphical representation of a multilayer perceptron. The input is a mini-batch, represented by an $\mathrm{N} \times \mathrm{D}$ matrix consisting of $\mathrm{N}$ examples with $\mathrm{D}$ features. In the feedforward pass this input is passed through an MLP with a series of layers, represented by vertical bars. This results in an $\mathrm{N} \times \mathrm{K}$ output matrix $Y$. From the output $Y$ and the expected output $T$ the loss $\mathcal{L}(Y, T)$ is computed.
  • Figure 2: A multilayer perceptron with three hidden layers. The input vector $x$ has $\mathrm{D}$ elements, and the output vector $y$ has $\mathrm{K}$ elements. Neurons in each layer are represented by circles, and connections between neurons are indicated by arrows. The weights associated with the connections form the parameters of the linear layers.
  • Figure 3: Data flow of a layer. A layer stores the input $X$ and the layer parameters $\theta$ (e.g., the weight matrix $W$ and the bias vector $b$ in case of a linear layer), and the corresponding gradients $\textsf{D} X$ and $\textsf{D} \theta$ with respect to the loss function. In the feedforward step the output $Y = f_\theta(X)$ is calculated and forwarded to the next layer. In the backpropagation step the output $Y$ and its gradient $\textsf{D} Y$ are obtained from the next layer and used to calculate the gradients $\textsf{D} X$ and $\textsf{D} \theta$. $X$ and $\textsf{D} X$ are then passed back to the previous layer. In the optimization step the gradient $\textsf{D} \theta$ is used to update the value $\theta$.

Theorems & Definitions (5)

  • Definition 1: Gradient
  • Definition 2: Gradient of the loss function
  • Definition 3: Jacobian
  • Lemma 1: Product Rule for vector functions
  • Lemma 2: Chain rule for vector functions