Towards Generalized Entropic Sparsification for Convolutional Neural Networks

Tin Barisin; Illia Horenko

Towards Generalized Entropic Sparsification for Convolutional Neural Networks

Tin Barisin, Illia Horenko

TL;DR

The paper tackles CNN overparameterization by proposing a layer-by-layer, data-driven pruning method based on entropic relaxation. It recasts convolutional layers as linear regression problems and extends SPARTAn to perform sparse entropic regression, yielding a structured channel-wise sparsification with the rule hat{Q}^{vec} = \Lambda D(w) and an objective combining entropy regularization, $L_2$ penalty, and MSE. Empirically, the approach achieves substantial sparsity (e.g., 55–84% on MNIST LeNet; 73–89% on CIFAR-10 VGG-16/ResNet18) with minimal accuracy loss (0.1–0.5%), while also reducing FLOPs and memory usage significantly. The method demonstrates the potential for discovering near-optimal compressed architectures from pre-trained models, while leaving open questions on hyperparameter optimization, transfer to other datasets, and robustness to adversarial settings.

Abstract

Convolutional neural networks (CNNs) are reported to be overparametrized. The search for optimal (minimal) and sufficient architecture is an NP-hard problem as the hyperparameter space for possible network configurations is vast. Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally-scalable entropic relaxation of the pruning problem. The sparse subnetwork is found from the pre-trained (full) CNN using the network entropy minimization as a sparsity constraint. This allows deploying a numerically scalable algorithm with a sublinear scaling cost. The method is validated on several benchmarks (architectures): (i) MNIST (LeNet) with sparsity 55%-84% and loss in accuracy 0.1%-0.5%, and (ii) CIFAR-10 (VGG-16, ResNet18) with sparsity 73-89% and loss in accuracy 0.1%-0.5%.

Towards Generalized Entropic Sparsification for Convolutional Neural Networks

TL;DR

penalty, and MSE. Empirically, the approach achieves substantial sparsity (e.g., 55–84% on MNIST LeNet; 73–89% on CIFAR-10 VGG-16/ResNet18) with minimal accuracy loss (0.1–0.5%), while also reducing FLOPs and memory usage significantly. The method demonstrates the potential for discovering near-optimal compressed architectures from pre-trained models, while leaving open questions on hyperparameter optimization, transfer to other datasets, and robustness to adversarial settings.

Abstract

Paper Structure (23 sections, 21 equations, 3 figures, 12 tables)

This paper contains 23 sections, 21 equations, 3 figures, 12 tables.

Introduction
Entropy and machine learning
Related work: network prunning
Method
Notation and basics on convolutional layers in deep networks
Problem formulation: sparsification for convolutional layers
Interpreting convolutional layer as linear layer
Generalized entropic sparsification for convolutional layers
Sparse entropic regression for convolutional layers
Connection to the entropic sparsification of the fully connected layers
Experiments
Sparsifying LeNet on MNIST
CIFAR-10
Sparsifying VGG-16
Sparsifying ResNet18
...and 8 more sections

Figures (3)

Figure 1: Illustration: transforming convolutional layer conv1 from VGG-16 (Table \ref{['tab:VGG-overview']}, Appendix \ref{['ref:net:details']}) as a fully connected layer in every point $(x,y)$ from the image domain, and applying sparsification method by sparsely solving the linear system of equation in $D(W)$ and $\Lambda$ based on SPARTAn algorithm, see (\ref{['spartan_eq']}) from Section \ref{['sec:gesCONV']}. Layer conv1 transforms the feature map of dimension $64\times32\times32$ to the output of the same dimension using $64\times64$ convolutions with masks of size $3\times 3$.
Figure 2: LeNet lecun98 is an example of a convolutional neural network that consists of both convolutional and fully connected layers. This figure visualizes an architecture described in Table \ref{['tab:lenet']} (Appendix \ref{['ref:net:details']}). Both convolutional layers are defined on the window of size $5\times5$.
Figure 3: Illustration of one convolutional block with the residual connection from ResNet18 (Table \ref{['tab:resnet18-overview']}): conv5, conv6, and conv5 shortcut. Here, every "conv" box includes convolution, batch normalization, and non-linearity ReLU. Note that "conv5 shortcut" is needed to the match the number of chanells (64) of the output of "conv6", as $x$ has 32 channels. In cases when the number of channels match one avoids using this operator in the residual connection.

Towards Generalized Entropic Sparsification for Convolutional Neural Networks

TL;DR

Abstract

Towards Generalized Entropic Sparsification for Convolutional Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (3)