A Signal Propagation Perspective for Pruning Neural Networks at Initialization
Namhoon Lee, Thalaiyasingam Ajanthan, Stephen Gould, Philip H. S. Torr
TL;DR
This work analyzes pruning neural networks at initialization through a signal propagation lens, formalizing initialization conditions that enable reliable connection sensitivity measurements via layerwise dynamical isometry. It shows that faithful gradient propagation, governed by the Jacobians, is essential for effective pruning and that pruning can disrupt dynamical isometry in sparse networks. To mitigate this, the authors propose a data-free method to recover approximate dynamical isometry (LDI-AI), improving trainability of pruned networks across architectures and datasets. They also demonstrate unsupervised pruning and neural architecture sculpting, revealing that architectures sculpted from oversized networks can outperform hand-designed baselines under the same parameter budget. Overall, the paper provides a principled framework linking initialization, pruning, and trainability, with practical implications for scalable, sparse neural networks and potential routes toward winning lottery ticket-like initializations.
Abstract
Network pruning is a promising avenue for compressing deep neural networks. A typical approach to pruning starts by training a model and then removing redundant parameters while minimizing the impact on what is learned. Alternatively, a recent approach shows that pruning can be done at initialization prior to training, based on a saliency criterion called connection sensitivity. However, it remains unclear exactly why pruning an untrained, randomly initialized neural network is effective. In this work, by noting connection sensitivity as a form of gradient, we formally characterize initialization conditions to ensure reliable connection sensitivity measurements, which in turn yields effective pruning results. Moreover, we analyze the signal propagation properties of the resulting pruned networks and introduce a simple, data-free method to improve their trainability. Our modifications to the existing pruning at initialization method lead to improved results on all tested network models for image classification tasks. Furthermore, we empirically study the effect of supervision for pruning and demonstrate that our signal propagation perspective, combined with unsupervised pruning, can be useful in various scenarios where pruning is applied to non-standard arbitrarily-designed architectures.
