Nerva: a Truly Sparse Implementation of Neural Networks
Wieger Wesselink, Bram Grooten, Qiao Xiao, Cassio de Campos, Mykola Pechenizkiy
TL;DR
Nerva tackles the challenge of training and deploying truly sparse neural networks by delivering a C++ library that uses sparse matrix operations through MKL CSR, avoiding masking-based sparsity. The approach yields linear reductions in runtime with increasing sparsity while maintaining accuracy comparable to PyTorch on CIFAR-10 with an MLP, and it achieves substantial memory savings via CSR storage. Through extensive CPU-based experiments, the paper shows up to ~4× faster training at high sparsity and significantly faster inference, with robustness across seeds and scalable behavior for moderate model sizes. The work highlights practical implications for deploying sparsity-driven models on commodity hardware and lays out a path toward dynamic sparse training and GPU acceleration as future directions.
Abstract
We introduce Nerva, a fast neural network library under development in C++. It supports sparsity by using the sparse matrix operations of Intel's Math Kernel Library (MKL), which eliminates the need for binary masks. We show that Nerva significantly decreases training time and memory usage while reaching equivalent accuracy to PyTorch. We run static sparse experiments with an MLP on CIFAR-10. On high sparsity levels like $99\%$, the runtime is reduced by a factor of $4\times$ compared to a PyTorch model using masks. Similar to other popular frameworks such as PyTorch and Keras, Nerva offers a Python interface for users to work with.
