Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks

Juyoung Yun

Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks

Juyoung Yun

TL;DR

This paper tackles the challenge of maintaining accuracy during aggressive pruning of deep residual networks by proposing a gradient sampling optimization, StochGradAdam, integrated with Magnitude-Based Pruning. The method uses a stochastic gradient mask to compute sampled gradients $\phi_t = \Omega \nabla f_t(\theta)$ and updates parameters with $\theta_{t+1} = \theta_t - \mu \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}$, while pruning weights via a percentile threshold $\psi = W_{\text{sorted}}\left(\left\lceil \frac{P}{100} |W| \right\rceil\right)$ so that $w' = 0$ if $|w| < \psi$. Across CIFAR-10 with ResNet-56/110/152, StochGradAdam consistently outperforms Adam, both before and after pruning, and maintains significantly higher post-pruning accuracies at 50% pruning (e.g., ResNet-56: $62.84\%$ vs $33.12\%$, ResNet-110: $76.67\%$ vs $44.85\%$, ResNet-152: $76.23\%$ vs $54.67\%$). These results suggest a practical route to robust, efficient networks suitable for resource-constrained environments, where gradient sampling contributes to better information retention during slimming.

Abstract

This research embarks on pioneering the integration of gradient sampling optimization techniques, particularly StochGradAdam, into the pruning process of neural networks. Our main objective is to address the significant challenge of maintaining accuracy in pruned neural models, critical in resource-constrained scenarios. Through extensive experimentation, we demonstrate that gradient sampling significantly preserves accuracy during and after the pruning process compared to traditional optimization methods. Our study highlights the pivotal role of gradient sampling in robust learning and maintaining crucial information post substantial model simplification. The results across CIFAR-10 datasets and residual neural architectures validate the versatility and effectiveness of our approach. This work presents a promising direction for developing efficient neural networks without compromising performance, even in environments with limited computational resources.

Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks

TL;DR

and updates parameters with

, while pruning weights via a percentile threshold

so that

. Across CIFAR-10 with ResNet-56/110/152, StochGradAdam consistently outperforms Adam, both before and after pruning, and maintains significantly higher post-pruning accuracies at 50% pruning (e.g., ResNet-56:

, ResNet-110:

, ResNet-152:

). These results suggest a practical route to robust, efficient networks suitable for resource-constrained environments, where gradient sampling contributes to better information retention during slimming.

Abstract

Paper Structure (9 sections, 16 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 9 sections, 16 equations, 6 figures, 2 tables, 2 algorithms.

Introduction
Related Works
Method
Training with StochGradAdam
Pruning Process
Theoretical Analysis
Experiments
Discussion
Conclusion

Figures (6)

Figure 1: Comparative Test Accuracy about Before and After 50% Pruned ResNet Models He2016ResNet Trained with StochGradAdam Yun2023StochGradAdam (denoted as SG-Adam) and Adam OptimizersKingma2014Adam on CIFAR-10 DatasetKrizhevsky2009CIFAR10
Figure 2: Illustration of sampled gradient matrices/tensors for fully-connected and convolutional layers. On the left, the original gradients are presented with $C_{in}$ and $C_{out}$ denoting the number of input and output channels, respectively. On the right, the effect of gradient sampling is shown, where the application of a sampling function $\phi=\Omega(\nabla f(\theta))$ retains significant gradient components, represented by the white areas, and eliminates less significant ones, depicted as darkened blocks
Figure 3: This diagram demonstrates the application of Magnitude-Based Pruning with Percentile Threshold across all layers of a Convolutional Neural Network (CNN). It visualizes the process of pruning by setting weights to zero if they fall below a specific magnitude percentile, which is represented by the black areas in each layer.
Figure 4: Weight distributions of ResNet architectures trained with Adam and StochGradAdam optimizers. The top row shows histograms for ResNet-56, ResNet-110, and ResNet-152 trained with a standard Adam optimizer, exhibiting typical bell-shaped distributions centered around zero. The bottom histogram represents a ResNet model trained with the StochGradAdam optimizer, indicating a wider spread in weight values.
Figure 5: Test Accuracy Comparison of StochGradAdam Yun2023StochGradAdam and Adam Kingma2014Adam Optimizers on ResNet Architectures He2016ResNet on CIFAR-10 Dataset Krizhevsky2009CIFAR10
...and 1 more figures

Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks

TL;DR

Abstract

Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)