Enhancing Accuracy in Deep Learning Using Random Matrix Theory

Leonid Berlyand; Etienne Sandier; Yitzchak Shmalo; Lei Zhang

Enhancing Accuracy in Deep Learning Using Random Matrix Theory

Leonid Berlyand, Etienne Sandier, Yitzchak Shmalo, Lei Zhang

TL;DR

This work explores the applications of random matrix theory in the training of deep neural networks, focusing on layer pruning that is reducing the number of DNN parameters (weights), and provides rigorous mathematical underpinning of these numerical results by proving the RMT-based Pruning Theorem.

Abstract

We explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning that is reducing the number of DNN parameters (weights). Our numerical results show that this pruning leads to a drastic reduction of parameters while not reducing the accuracy of DNNs and CNNs. Moreover, pruning the fully connected DNNs actually increases the accuracy and decreases the variance for random initializations. Our numerics indicate that this enhancement in accuracy is due to the simplification of the loss landscape. We next provide rigorous mathematical underpinning of these numerical results by proving the RMT-based Pruning Theorem. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.

Enhancing Accuracy in Deep Learning Using Random Matrix Theory

TL;DR

Abstract

Paper Structure (60 sections, 13 theorems, 133 equations, 26 figures, 7 tables, 4 algorithms)

This paper contains 60 sections, 13 theorems, 133 equations, 26 figures, 7 tables, 4 algorithms.

Introduction
Acknowledgements
Background on Deep Learning
Numerical Algorithm and Experiments
Numerical Algorithm
An overview of the Marchenko-Pastur (MP) distribution and its applications in machine learning
Using MP for pruning DNN weights
MP and Tracy Widom distribution for DNN training
Numerical Experiments
Training of fully connected DNNs on MNIST: simplifying the loss landscape
Simplification of loss landscape for more efficient training
Simplifying the loss landscape for fully connected DNNs on Fashion MNIST
MP-based pruning with sparsification for fully connected DNNs on Fashion MNIST
MP-based pruning of CNNs on MNIST and Fashion MNIST
MP-based pruning with sparsification for CNN trained on Fashion MNIST
...and 45 more sections

Key Result

Theorem 4.2

Let $W$ be an $N\times M$ random matrix with $M \leq N$. The entries $W_{i,j}$ are independent and identically distributed random variables with mean $0$ and variance $\sigma^2<\infty$. Define $X = \frac{1}{N} W^T W$. Assuming that $N \to \infty$ and $\frac{M}{N} \to c \in (0,+\infty)$, the ESD of $ with

Figures (26)

Figure 1: Comparison of Normal DNN, trained normally, and pruned DNN, trained using the RMT approach on the test set. The sub-figures correspond to the different initial topologies: (a) $[784, 3000,3000,2000, 500, 10]$, (b) $[784, 1000,1000,1000, 500, 10]$, (c) $[784, 2000,2000,1000, 500, 10]$, (d) $[784, 1500,3000,1500,500, 10]$, and (e) $[784, 1000,1000,1000, 500, 10]$ with a larger goodness-of-fit parameter of 1.
Figure 2: Comparison of Normal DNN, trained normally, and pruned DNN, trained using the RMT approach on the training set. The sub-figures correspond to the different initial topologies: (a) $[784, 2000,2000,1000, 500, 10]$, (b) $[784, 1500,3000,1500,500, 10]$. The other examples in Fig. \ref{['Comparison']} have similar-looking accuracies on their training set.
Figure 3: Comparison of Normal DNN, trained normally, and pruned DNN, trained using the RMT approach on the test set. The sub-figures correspond to the different initial topologies: (a) $[784, 2000,4000,2000,500, 10]$, (b) $[784, 2000,2000,2000,2000,1000, 500, 10]$, (c) $[784, 3000,4000,3000,500, 10]$.
Figure 4: Comparison of Normal DNN, trained normally, and pruned DNN, trained using the RMT approach on the training set. The sub-figures correspond to the different initial topologies: (a) $[784, 3000,4000,3000,500, 10]$, (b) $[784, 2000,2000,2000,2000,1000, 500, 10]$.
Figure 5: Analysis of DNN Training and Pruning
...and 21 more figures

Theorems & Definitions (55)

Definition 4.1
Theorem 4.2: Marchenko and Pastur (1967) marchenko1967distribution
Remark 4.1
Remark 4.2
Example 4.3
Remark 4.3
Remark 4.4
Example 4.4
Remark 4.5
Example 4.5
...and 45 more

Enhancing Accuracy in Deep Learning Using Random Matrix Theory

TL;DR

Abstract

Enhancing Accuracy in Deep Learning Using Random Matrix Theory

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (26)

Theorems & Definitions (55)