Batch Matrix-form Equations and Implementation of Multilayer Perceptrons
Wieger Wesselink, Bram Grooten, Huub van de Wetering, Qiao Xiao, Decebal Constantin Mocanu
TL;DR
This work fills a gap in neural network literature by providing complete, explicit batch matrix-form derivations for forward and backward passes of MLPs, including advanced layers like batch normalization and softmax. It couples mathematical rigor with symbolic validation via SymPy and delivers uniform reference implementations across NumPy, PyTorch, JAX, TensorFlow, and a high-performance C++ backend optimized for sparsity. The key contributions are a full batch-form backpropagation derivation, symbolic gradient validation, and cross-framework implementations that illuminate the computational structure and enable efficient sparse computation. Together, these results offer a transparent, extensible foundation for teaching, research, and optimization of neural networks, particularly in sparse settings where explicit formulations reveal performance bottlenecks and guide targeted improvements.
Abstract
Multilayer perceptrons (MLPs) remain fundamental to modern deep learning, yet their algorithmic details are rarely presented in complete, explicit \emph{batch matrix-form}. Rather, most references express gradients per sample or rely on automatic differentiation. Although automatic differentiation can achieve equally high computational efficiency, the usage of batch matrix-form makes the computational structure explicit, which is essential for transparent, systematic analysis, and optimization in settings such as sparse neural networks. This paper fills that gap by providing a mathematically rigorous and implementation-ready specification of MLPs in batch matrix-form. We derive forward and backward equations for all standard and advanced layers, including batch normalization and softmax, and validate all equations using the symbolic mathematics library SymPy. From these specifications, we construct uniform reference implementations in NumPy, PyTorch, JAX, TensorFlow, and a high-performance C++ backend optimized for sparse operations. Our main contributions are: (1) a complete derivation of batch matrix-form backpropagation for MLPs, (2) symbolic validation of all gradient equations, (3) uniform Python and C++ reference implementations grounded in a small set of matrix primitives, and (4) demonstration of how explicit formulations enable efficient sparse computation. Together, these results establish a validated, extensible foundation for understanding, teaching, and researching neural network algorithms.
