Nerva: a Truly Sparse Implementation of Neural Networks

Wieger Wesselink; Bram Grooten; Qiao Xiao; Cassio de Campos; Mykola Pechenizkiy

Nerva: a Truly Sparse Implementation of Neural Networks

Wieger Wesselink, Bram Grooten, Qiao Xiao, Cassio de Campos, Mykola Pechenizkiy

TL;DR

Nerva tackles the challenge of training and deploying truly sparse neural networks by delivering a C++ library that uses sparse matrix operations through MKL CSR, avoiding masking-based sparsity. The approach yields linear reductions in runtime with increasing sparsity while maintaining accuracy comparable to PyTorch on CIFAR-10 with an MLP, and it achieves substantial memory savings via CSR storage. Through extensive CPU-based experiments, the paper shows up to ~4× faster training at high sparsity and significantly faster inference, with robustness across seeds and scalable behavior for moderate model sizes. The work highlights practical implications for deploying sparsity-driven models on commodity hardware and lays out a path toward dynamic sparse training and GPU acceleration as future directions.

Abstract

We introduce Nerva, a fast neural network library under development in C++. It supports sparsity by using the sparse matrix operations of Intel's Math Kernel Library (MKL), which eliminates the need for binary masks. We show that Nerva significantly decreases training time and memory usage while reaching equivalent accuracy to PyTorch. We run static sparse experiments with an MLP on CIFAR-10. On high sparsity levels like $99\%$, the runtime is reduced by a factor of $4\times$ compared to a PyTorch model using masks. Similar to other popular frameworks such as PyTorch and Keras, Nerva offers a Python interface for users to work with.

Nerva: a Truly Sparse Implementation of Neural Networks

TL;DR

Abstract

, the runtime is reduced by a factor of

compared to a PyTorch model using masks. Similar to other popular frameworks such as PyTorch and Keras, Nerva offers a Python interface for users to work with.

Paper Structure (21 sections, 4 equations, 6 figures, 4 tables)

This paper contains 21 sections, 4 equations, 6 figures, 4 tables.

Introduction
Related Work
Sparse Training
Truly Sparse Implementations
Background
Implementation
Experiments
Experimental setup
Equivalent accuracy
Decreased training time
Decreased inference time
Scalability
Memory
Discussion and Conclusion
Limitations & Future Work
...and 6 more sections

Figures (6)

Figure 1: Accuracy vs sparsity. Notice the logit-scale on the horizontal axis, values closer to 1 are stretched out. The accuracy of Nerva and PyTorch are similar, except for the high sparsity regime where Nerva outperforms PyTorch. The reason for this is yet unknown.
Figure 2: The total training time of 100 epochs for CIFAR-10, on a regular desktop with 4 CPU cores. As the sparsity level increases, the running time of Nerva goes down linearly, as it takes advantage of sparse matrix operations. The running time for PyTorch stays roughly constant, because it uses binary masks.
Figure 3: Inference time vs sparsity. The graph shows the average inference time of 1 example of CIFAR-10 in milliseconds, on a regular desktop with 4 CPU cores. Like in figure \ref{['fig:acc-vs-sparsity']} a logit-scale is used. The inference time of Nerva is significantly lower, especially for higher sparsity levels.
Figure 4: Accuracy vs Epoch. The comparison of the test and training accuracy of Nerva and PyTorch during training on CIFAR-10 with various sparsity levels, over three runs with different seeds.
Figure 5: Loss vs Epoch. The comparison of learning curves of Nerva and PyTorch during training on CIFAR-10 with various sparsity levels, over three runs with different seeds.
...and 1 more figures

Nerva: a Truly Sparse Implementation of Neural Networks

TL;DR

Abstract

Nerva: a Truly Sparse Implementation of Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)