DRIVE: Dual Gradient-Based Rapid Iterative Pruning

Dhananjay Saikumar; Blesson Varghese

DRIVE: Dual Gradient-Based Rapid Iterative Pruning

Dhananjay Saikumar, Blesson Varghese

TL;DR

DRIVE addresses the high computational cost of pruning large DNNs by fusing a brief dense training phase with a novel dual gradient-based ranking to identify redundant parameters before full training. The method combines the L1 norm, connection sensitivity, and convergence sensitivity into a single scoring criterion, enabling iterative pruning that preserves trainability while achieving high sparsity. Empirical results across AlexNet, VGG-16, and ResNet-18 on CIFAR-10/100, Tiny ImageNet, and ImageNet show DRIVE consistently outperforming training-agnostic methods like SNIP and SynFlow and approaching or matching IMP in accuracy, while being orders of magnitude faster (43x–869x). This makes DRIVE a practical solution for on-demand, energy-efficient sparse network deployment without sacrificing much accuracy.

Abstract

Modern deep neural networks (DNNs) consist of millions of parameters, necessitating high-performance computing during training and inference. Pruning is one solution that significantly reduces the space and time complexities of DNNs. Traditional pruning methods that are applied post-training focus on streamlining inference, but there are recent efforts to leverage sparsity early on by pruning before training. Pruning methods, such as iterative magnitude-based pruning (IMP) achieve up to a 90% parameter reduction while retaining accuracy comparable to the original model. However, this leads to impractical runtime as it relies on multiple train-prune-reset cycles to identify and eliminate redundant parameters. In contrast, training agnostic early pruning methods, such as SNIP and SynFlow offer fast pruning but fall short of the accuracy achieved by IMP at high sparsities. To bridge this gap, we present Dual Gradient-Based Rapid Iterative Pruning (DRIVE), which leverages dense training for initial epochs to counteract the randomness inherent at the initialization. Subsequently, it employs a unique dual gradient-based metric for parameter ranking. It has been experimentally demonstrated for VGG and ResNet architectures on CIFAR-10/100 and Tiny ImageNet, and ResNet on ImageNet that DRIVE consistently has superior performance over other training-agnostic early pruning methods in accuracy. Notably, DRIVE is 43$\times$ to 869$\times$ faster than IMP for pruning.

DRIVE: Dual Gradient-Based Rapid Iterative Pruning

TL;DR

Abstract

to 869

faster than IMP for pruning.

Paper Structure (8 sections, 10 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 8 sections, 10 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
Background & Related Work
DRIVE Design
Pruning Metric
Experiments
Setup
Results
Conclusions

Figures (6)

Figure 1: Progression of Iterative Magnitude Pruning (IMP) that starts from a dense DNN and has successive train-prune-reset cycles. Neurons are shown as circles, yellow lines indicate weights at random initialization and teal lines show weights post-training.
Figure 2: Comparing test accuracy of sparse networks derived using early pruning methods for AlexNet with 99.3% parameters removed and trained on the CIFAR-10 dataset.
Figure 3: Change in test accuracy of trained sparse networks produced by SNIP and SynFlow relative to IMP, plotted against the network sparsity for ResNet-18 on the CIFAR-100 dataset.
Figure 4: DRIVE initially trains an unpruned network and then iteratively prunes the network. Neurons are circles. Yellow lines are initial weights and teal lines are weights after brief training.
Figure 5: Accuracy vs. sparsity for AlexNet on CIFAR-10. The horizontal line is the accuracy of an unpruned trained network.
...and 1 more figures

DRIVE: Dual Gradient-Based Rapid Iterative Pruning

TL;DR

Abstract

DRIVE: Dual Gradient-Based Rapid Iterative Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)