Pruning at Initialization -- A Sketching Perspective
Noga Bar, Raja Giryes
TL;DR
This paper reframes pruning at initialization as a sketching problem for efficient matrix-vector multiplications, showing that finding a sparse mask corresponds to sampling coordinates with an optimal probability. It derives explicit bounds on the approximation error when applying the initialization mask to the end-of-training vector and demonstrates that data-free pruning can explain the success of lottery tickets, supported by a sketching-based analysis. The authors connect pruning methods like SynFlow and SNIP to sketching, propose randomized mask strategies to improve data-free pruning, and extend the perspective to Neural Tangent Kernel pruning with theoretical bounds. Empirically, randomized, data-free pruning performs competitively across multiple architectures and datasets, suggesting practical benefits and robustness when data are unavailable. Overall, the sketching lens provides theoretical grounding for data-independence in sparse subnetworks and offers concrete algorithmic improvements for pruning without data.
Abstract
The lottery ticket hypothesis (LTH) has increased attention to pruning neural networks at initialization. We study this problem in the linear setting. We show that finding a sparse mask at initialization is equivalent to the sketching problem introduced for efficient matrix multiplication. This gives us tools to analyze the LTH problem and gain insights into it. Specifically, using the mask found at initialization, we bound the approximation error of the pruned linear model at the end of training. We theoretically justify previous empirical evidence that the search for sparse networks may be data independent. By using the sketching perspective, we suggest a generic improvement to existing algorithms for pruning at initialization, which we show to be beneficial in the data-independent case.
