Towards Generalized Entropic Sparsification for Convolutional Neural Networks
Tin Barisin, Illia Horenko
TL;DR
The paper tackles CNN overparameterization by proposing a layer-by-layer, data-driven pruning method based on entropic relaxation. It recasts convolutional layers as linear regression problems and extends SPARTAn to perform sparse entropic regression, yielding a structured channel-wise sparsification with the rule hat{Q}^{vec} = \Lambda D(w) and an objective combining entropy regularization, $L_2$ penalty, and MSE. Empirically, the approach achieves substantial sparsity (e.g., 55–84% on MNIST LeNet; 73–89% on CIFAR-10 VGG-16/ResNet18) with minimal accuracy loss (0.1–0.5%), while also reducing FLOPs and memory usage significantly. The method demonstrates the potential for discovering near-optimal compressed architectures from pre-trained models, while leaving open questions on hyperparameter optimization, transfer to other datasets, and robustness to adversarial settings.
Abstract
Convolutional neural networks (CNNs) are reported to be overparametrized. The search for optimal (minimal) and sufficient architecture is an NP-hard problem as the hyperparameter space for possible network configurations is vast. Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally-scalable entropic relaxation of the pruning problem. The sparse subnetwork is found from the pre-trained (full) CNN using the network entropy minimization as a sparsity constraint. This allows deploying a numerically scalable algorithm with a sublinear scaling cost. The method is validated on several benchmarks (architectures): (i) MNIST (LeNet) with sparsity 55%-84% and loss in accuracy 0.1%-0.5%, and (ii) CIFAR-10 (VGG-16, ResNet18) with sparsity 73-89% and loss in accuracy 0.1%-0.5%.
