Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin
TL;DR
This work reframes dying neurons as a resource for pruning by introducing Demon Pruning (DemP), a dense-to-sparse training method that actively promotes neuron saturation through scheduled regularization of normalization scale parameters and asymmetric noise added to live weights. DemP prunes dead neurons on-the-fly during training, yielding highly structured sparsity with minimal performance loss and substantial training speedups on CIFAR-10, ImageNet, and transformer-like models. The method demonstrates superior accuracy-sparsity tradeoffs compared to strong dense-to-sparse baselines, especially with Adam, and is compatible with existing pruning techniques, offering a practical approach to efficient model compression. Theoretical and empirical analysis links neuron death to SGD noise and hyperparameters, and ablations validate design choices, while broader impacts address energy efficiency and responsible AI considerations.
Abstract
When training neural networks, dying neurons -- units becoming inactive or saturated -- are traditionally seen as harmful. This paper sheds new light on this phenomenon. By exploring the impact of various hyperparameter configurations on dying neurons during training, we gather insights on how to improve upon sparse training approaches to pruning. We introduce Demon Pruning (DemP), a method that controls the proliferation of dead neurons through a combination of noise injection on active units and a one-cycle schedule regularization strategy, dynamically leading to network sparsity. Experiments on CIFAR-10 and ImageNet datasets demonstrate that DemP outperforms existing dense-to-sparse structured pruning methods, achieving better accuracy-sparsity tradeoffs and accelerating training by up to 3.56$\times$. These findings provide a novel perspective on dying neurons as a resource for efficient model compression and optimization.
