Growing Winning Subnetworks, Not Pruning Them: A Paradigm for Density Discovery in Sparse Neural Networks
Qihang Yao, Constantine Dovrolis
TL;DR
This paper reframes sparse neural network training by introducing PWMPR, a growth-based paradigm that automatically discovers operating density while training. Starting from a sparse seed, PWMPR grows connections guided by a Path Weight Magnitude Product (PWMP) score derived from path kernels, while randomization mitigates bottlenecks and a logistic-fit stopping rule halts growth at plateau. Empirical results on CIFAR, TinyImageNet, and ImageNet show PWMPR can approach IMP-derived lottery-ticket performance, but at higher densities and with substantially lower training cost (~1.5x dense vs. 3–4x for IMP-C). Overall, PWMPR demonstrates that constructive growth offers a viable, cost-efficient alternative to pruning and dynamic sparsity, opening the door to hybrid grow-prune methods and broadened applicability across architectures and domains.
Abstract
The lottery ticket hypothesis suggests that dense networks contain sparse subnetworks that can be trained in isolation to match full-model performance. Existing approaches-iterative pruning, dynamic sparse training, and pruning at initialization-either incur heavy retraining costs or assume the target density is fixed in advance. We introduce Path Weight Magnitude Product-biased Random growth (PWMPR), a constructive sparse-to-dense training paradigm that grows networks rather than pruning them, while automatically discovering their operating density. Starting from a sparse seed, PWMPR adds edges guided by path-kernel-inspired scores, mitigates bottlenecks via randomization, and stops when a logistic-fit rule detects plateauing accuracy. Experiments on CIFAR, TinyImageNet, and ImageNet show that PWMPR approaches the performance of IMP-derived lottery tickets-though at higher density-at substantially lower cost (~1.5x dense vs. 3-4x for IMP). These results establish growth-based density discovery as a promising paradigm that complements pruning and dynamic sparsity.
