Deep Convolutional Neural Networks Structured Pruning via Gravity Regularization
Abdesselam Ferdi
TL;DR
This work introduces gravity-based regularization for structured pruning in DCNNs, integrating a gravity penalty into training to redistribute convolutional filter weights around an attracting filter based on filter mass $m_n = \|W_{n,l}\|_1$ and distance $d_{n,l}$. The gravity term $F_{n,l} = G\frac{m_{1,l}m_{n,l}}{d_{n,l}^2}$ added to the cost with coefficient $\alpha_g$ biases near filters to retain nonzero weights while distant filters are driven toward zero, enabling pruning after training without architectural changes. Experiments on CIFAR with ResNet-56 and VGG-19 demonstrate competitive pruning performance, though with higher training overhead due to the gravity penalty; fine-tuned results indicate robust post-pruning accuracy at various pruning ratios. The method presents a practical, adaptive approach to accelerate DCNNs by reducing FLOPs and parameters while preserving essential information, with potential for broader applicability beyond CIFAR benchmarks.
Abstract
Structured pruning is a widely employed strategy for accelerating deep convolutional neural networks (DCNNs). However, existing methods often necessitate modifications to the original architectures, involve complex implementations, and require lengthy fine-tuning stages. To address these challenges, we propose a novel physics-inspired approach that integrates the concept of gravity into the training stage of DCNNs. In this approach, the gravity is directly proportional to the product of the masses of the convolution filter and the attracting filter, and inversely proportional to the square of the distance between them. We applied this force to the convolution filters, either drawing filters closer to the attracting filter (experiencing weaker gravity) toward non-zero weights or pulling filters farther away (subject to stronger gravity) toward zero weights. As a result, filters experiencing stronger gravity have their weights reduced to zero, enabling their removal, while filters under weaker gravity retain significant weights and preserve important information. Our method simultaneously optimizes the filter weights and ranks their importance, eliminating the need for complex implementations or extensive fine-tuning. We validated the proposed approach on popular DCNN architectures using the CIFAR dataset, achieving competitive results compared to existing methods.
