A Proximal Algorithm for Network Slimming

Kevin Bui; Fanghui Xue; Fredrick Park; Yingyong Qi; Jack Xin

A Proximal Algorithm for Network Slimming

Kevin Bui, Fanghui Xue, Fredrick Park, Yingyong Qi, Jack Xin

TL;DR

This work tackles channel pruning in CNNs by replacing subgradient-based NS with a proximal PALM framework that trains toward truly sparse, accurate network structures. It introduces a constrained reformulation using an auxiliary variable $\xi$ and derives explicit $(W,\gamma)$ and $\xi$ update rules, enabling automatic sparsification via soft-thresholding without a post hoc pruning threshold. Under Kurdyka-Łojasiewicz assumptions and with $\alpha > L$, the method provably converges to a critical point, and experiments on VGGNet, DenseNet, and ResNet-164 for CIFAR-10/100 show many BN scaling factors become exactly zero after one training pass, achieving competitive accuracy with substantial compression and without mandatory fine tuning. Overall, proximal NS reduces the original three-step NS pipeline to a single prune-while-training phase, offering practical speedups and simplified deployment while maintaining performance on standard benchmarks, with avenues for further improvement via nonconvex regularizers and architecture search.

Abstract

As a popular channel pruning method for convolutional neural networks (CNNs), network slimming (NS) has a three-stage process: (1) it trains a CNN with $\ell_1$ regularization applied to the scaling factors of the batch normalization layers; (2) it removes channels whose scaling factors are below a chosen threshold; and (3) it retrains the pruned model to recover the original accuracy. This time-consuming, three-step process is a result of using subgradient descent to train CNNs. Because subgradient descent does not exactly train CNNs towards sparse, accurate structures, the latter two steps are necessary. Moreover, subgradient descent does not have any convergence guarantee. Therefore, we develop an alternative algorithm called proximal NS. Our proposed algorithm trains CNNs towards sparse, accurate structures, so identifying a scaling factor threshold is unnecessary and fine tuning the pruned CNNs is optional. Using Kurdyka-Łojasiewicz assumptions, we establish global convergence of proximal NS. Lastly, we validate the efficacy of the proposed algorithm on VGGNet, DenseNet and ResNet on CIFAR 10/100. Our experiments demonstrate that after one round of training, proximal NS yields a CNN with competitive accuracy and compression.

A Proximal Algorithm for Network Slimming

TL;DR

and derives explicit

and

update rules, enabling automatic sparsification via soft-thresholding without a post hoc pruning threshold. Under Kurdyka-Łojasiewicz assumptions and with

, the method provably converges to a critical point, and experiments on VGGNet, DenseNet, and ResNet-164 for CIFAR-10/100 show many BN scaling factors become exactly zero after one training pass, achieving competitive accuracy with substantial compression and without mandatory fine tuning. Overall, proximal NS reduces the original three-step NS pipeline to a single prune-while-training phase, offering practical speedups and simplified deployment while maintaining performance on standard benchmarks, with avenues for further improvement via nonconvex regularizers and architecture search.

Abstract

As a popular channel pruning method for convolutional neural networks (CNNs), network slimming (NS) has a three-stage process: (1) it trains a CNN with

regularization applied to the scaling factors of the batch normalization layers; (2) it removes channels whose scaling factors are below a chosen threshold; and (3) it retrains the pruned model to recover the original accuracy. This time-consuming, three-step process is a result of using subgradient descent to train CNNs. Because subgradient descent does not exactly train CNNs towards sparse, accurate structures, the latter two steps are necessary. Moreover, subgradient descent does not have any convergence guarantee. Therefore, we develop an alternative algorithm called proximal NS. Our proposed algorithm trains CNNs towards sparse, accurate structures, so identifying a scaling factor threshold is unnecessary and fine tuning the pruned CNNs is optional. Using Kurdyka-Łojasiewicz assumptions, we establish global convergence of proximal NS. Lastly, we validate the efficacy of the proposed algorithm on VGGNet, DenseNet and ResNet on CIFAR 10/100. Our experiments demonstrate that after one round of training, proximal NS yields a CNN with competitive accuracy and compression.

Paper Structure (13 sections, 5 theorems, 32 equations, 3 tables, 1 algorithm)

This paper contains 13 sections, 5 theorems, 32 equations, 3 tables, 1 algorithm.

Introduction
Related Works
Proposed Algorithm
Batch Normalization Layer
Numerical Optimization
$(W,\gamma)$-subproblem
$\xi$-subproblem
Convergence Analysis
Numerical Experiments
Implementation Details
Results
Conclusion
Appendix

Key Result

theorem thmcountertheorem

Under Assumption assume:loss_function, if $\{(W^t, \gamma^t, \xi^t)\}_{t=1}^{\infty}$ generated by Algorithm alg:prox_network_slimming is bounded and we have $\alpha > L$, then $\{(W^t, \gamma^t, \xi^t)\}_{t=1}^{\infty}$ converges to a critical point $(W^*, \gamma^*, \xi^*)$ of $F$.

Theorems & Definitions (11)

definition thmcounterdefinition: bolte2014proximal
remark thmcounterremark
theorem thmcountertheorem
definition thmcounterdefinition: rockafellar2009variational
lemma thmcounterlemma: Strong Convexity Lemma beck2017first
lemma thmcounterlemma: Descent Lemma beck2017first
lemma thmcounterlemma: Sufficient Decrease
proof
lemma thmcounterlemma: Relative error property
proof
...and 1 more

A Proximal Algorithm for Network Slimming

TL;DR

Abstract

A Proximal Algorithm for Network Slimming

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (11)