Table of Contents
Fetching ...

Signal Collapse in One-Shot Pruning: When Sparse Models Fail to Distinguish Neural Representations

Dhananjay Saikumar, Blesson Varghese

TL;DR

This work identifies signal collapse—the progressive drop in activation variance across layers—as the root cause of accuracy loss in one-shot pruning. It shows that weight selection (MP vs IP) is far less important than how pruning disturbs activation flow, with Hessian-based weight updates providing only part of the recovery. The authors introduce REFLOW, a lightweight BN recalibration method that restores activation variance without updating weights, achieving state-of-the-art performance across architectures and sparsities (e.g., ResNeXt-101 rising from under $4.1\%$ to $78.9\%$ top-1 accuracy at $80\%$ sparsity with $20\%$ weights remaining). REFLOW demonstrates that high-quality sparse subnetworks exist within the original parameter space and that preserving signal propagation is key to effective one-shot pruning, offering a practical, gradient-free route to deployment on resource-constrained hardware.

Abstract

Neural network pruning is essential for reducing model complexity to enable deployment on resource constrained hardware. While performance loss of pruned networks is often attributed to the removal of critical parameters, we identify signal collapse a reduction in activation variance across layers as the root cause. Existing one shot pruning methods focus on weight selection strategies and rely on computationally expensive second order approximations. In contrast, we demonstrate that mitigating signal collapse, rather than optimizing weight selection, is key to improving accuracy of pruned networks. We propose REFLOW that addresses signal collapse without updating trainable weights, revealing high quality sparse sub networks within the original parameter space. REFLOW enables magnitude pruning to achieve state of the art performance, restoring ResNeXt101 accuracy from under 4.1% to 78.9% on ImageNet with only 20% of the weights retained, surpassing state of the art approaches.

Signal Collapse in One-Shot Pruning: When Sparse Models Fail to Distinguish Neural Representations

TL;DR

This work identifies signal collapse—the progressive drop in activation variance across layers—as the root cause of accuracy loss in one-shot pruning. It shows that weight selection (MP vs IP) is far less important than how pruning disturbs activation flow, with Hessian-based weight updates providing only part of the recovery. The authors introduce REFLOW, a lightweight BN recalibration method that restores activation variance without updating weights, achieving state-of-the-art performance across architectures and sparsities (e.g., ResNeXt-101 rising from under to top-1 accuracy at sparsity with weights remaining). REFLOW demonstrates that high-quality sparse subnetworks exist within the original parameter space and that preserving signal propagation is key to effective one-shot pruning, offering a practical, gradient-free route to deployment on resource-constrained hardware.

Abstract

Neural network pruning is essential for reducing model complexity to enable deployment on resource constrained hardware. While performance loss of pruned networks is often attributed to the removal of critical parameters, we identify signal collapse a reduction in activation variance across layers as the root cause. Existing one shot pruning methods focus on weight selection strategies and rely on computationally expensive second order approximations. In contrast, we demonstrate that mitigating signal collapse, rather than optimizing weight selection, is key to improving accuracy of pruned networks. We propose REFLOW that addresses signal collapse without updating trainable weights, revealing high quality sparse sub networks within the original parameter space. REFLOW enables magnitude pruning to achieve state of the art performance, restoring ResNeXt101 accuracy from under 4.1% to 78.9% on ImageNet with only 20% of the weights retained, surpassing state of the art approaches.

Paper Structure

This paper contains 28 sections, 28 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Comparison of test accuracy gain of impact-based pruning methods over magnitude pruning for a pre-trained MobileNet on ImageNet at different sparsity levels. Left: Selection-only pruning methods. Right: Pruning methods with weight updates show significant performance gains.
  • Figure 2: Comparison of test accuracy gain over magnitude pruning for a pre-trained MobileNet (trained on ImageNet) at different sparsity levels.
  • Figure 3: Comparison of test accuracy gain over magnitude pruning for a pre-trained MobileNet (trained on ImageNet) at different sparsity levels.
  • Figure 4: Layer-wise signal variance ratios $\frac{\mathrm{Var}^{\text{(Pruned)}}}{\mathrm{Var}^{\text{(Orig)}}},$ in pruned MobileNet (on ImageNet). Higher sparsity leads to severe signal collapse in deeper layers.
  • Figure 5: Distribution of predictions made by ResNet-20 on CIFAR-10. The unpruned model predicts uniformly across classes, discriminating between inputs, while the pruned model maps most inputs to a single class.
  • ...and 9 more figures