Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks
Juyoung Yun
TL;DR
This paper tackles the challenge of maintaining accuracy during aggressive pruning of deep residual networks by proposing a gradient sampling optimization, StochGradAdam, integrated with Magnitude-Based Pruning. The method uses a stochastic gradient mask to compute sampled gradients $\phi_t = \Omega \nabla f_t(\theta)$ and updates parameters with $\theta_{t+1} = \theta_t - \mu \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}$, while pruning weights via a percentile threshold $\psi = W_{\text{sorted}}\left(\left\lceil \frac{P}{100} |W| \right\rceil\right)$ so that $w' = 0$ if $|w| < \psi$. Across CIFAR-10 with ResNet-56/110/152, StochGradAdam consistently outperforms Adam, both before and after pruning, and maintains significantly higher post-pruning accuracies at 50% pruning (e.g., ResNet-56: $62.84\%$ vs $33.12\%$, ResNet-110: $76.67\%$ vs $44.85\%$, ResNet-152: $76.23\%$ vs $54.67\%$). These results suggest a practical route to robust, efficient networks suitable for resource-constrained environments, where gradient sampling contributes to better information retention during slimming.
Abstract
This research embarks on pioneering the integration of gradient sampling optimization techniques, particularly StochGradAdam, into the pruning process of neural networks. Our main objective is to address the significant challenge of maintaining accuracy in pruned neural models, critical in resource-constrained scenarios. Through extensive experimentation, we demonstrate that gradient sampling significantly preserves accuracy during and after the pruning process compared to traditional optimization methods. Our study highlights the pivotal role of gradient sampling in robust learning and maintaining crucial information post substantial model simplification. The results across CIFAR-10 datasets and residual neural architectures validate the versatility and effectiveness of our approach. This work presents a promising direction for developing efficient neural networks without compromising performance, even in environments with limited computational resources.
