Shapley Pruning for Neural Network Compression

Kamil Adamczewski; Yawei Li; Luc van Gool

Shapley Pruning for Neural Network Compression

Kamil Adamczewski, Yawei Li, Luc van Gool

TL;DR

This work reframes neural network pruning as a coalitional game, using the Shapley value to quantify each neuron's average marginal contribution to network performance via the coalition payoff $ u( K)$. It introduces three practical approximations—partial Shapley value, averaging permutations, and weighted least-squares regression—and a new Oracle ranking benchmark to assess ranking quality. Empirical results across Lenet-5, VGG-16, ResNet-56, and ResNet-50 demonstrate that Shapley-based pruning yields state-of-the-art compression, achieving substantial reductions in FLOPs and parameters while maintaining accuracy. The approach offers a principled, scalable framework for channel-wise pruning with strong performance under realistic computational budgets.

Abstract

Neural network pruning is a rich field with a variety of approaches. In this work, we propose to connect the existing pruning concepts such as leave-one-out pruning and oracle pruning and develop them into a more general Shapley value-based framework that targets the compression of convolutional neural networks. To allow for practical applications in utilizing the Shapley value, this work presents the Shapley value approximations, and performs the comparative analysis in terms of cost-benefit utility for the neural network compression. The proposed ranks are evaluated against a new benchmark, Oracle rank, constructed based on oracle sets. The broad experiments show that the proposed normative ranking and its approximations show practical results, obtaining state-of-the-art network compression.

Shapley Pruning for Neural Network Compression

TL;DR

This work reframes neural network pruning as a coalitional game, using the Shapley value to quantify each neuron's average marginal contribution to network performance via the coalition payoff

. It introduces three practical approximations—partial Shapley value, averaging permutations, and weighted least-squares regression—and a new Oracle ranking benchmark to assess ranking quality. Empirical results across Lenet-5, VGG-16, ResNet-56, and ResNet-50 demonstrate that Shapley-based pruning yields state-of-the-art compression, achieving substantial reductions in FLOPs and parameters while maintaining accuracy. The approach offers a principled, scalable framework for channel-wise pruning with strong performance under realistic computational budgets.

Abstract

Paper Structure (18 sections, 10 equations, 2 figures, 4 tables)

This paper contains 18 sections, 10 equations, 2 figures, 4 tables.

Related work
Derivative pruning.
The Shapley value.
Problem formulation
Game theoretical neuron ranking
Coalitional game theory
Shapley value
Three approximations of the Shapley value
Approximation via partial Shapley Value
Approximation via averaging permutations
Approximation via weighted least-squares regression
Experiments
Oracle ranking benchmark
Shapley value approximation schemes
Compression
...and 3 more sections

Figures (2)

Figure 1: Visual comparison of the heatmaps produced by three different pruned models for Resnet-50 and Imagenet [selvaraju2017grad]. The top-left one is the original model, the top-right one prunes the least important nodes according to the Shapley ranking (what we want), the botton one prunes the most important nodes according to the Shapley ranking (what we do not want). The red color indicates the elements the network focuses on during the classification. Shapley pruning properly in an interpretable way ranks important and unimportant nodes.
Figure 2: An example of computing the Shapley value of node 1, $\varphi_1$, in a neural network according to the definition from Eq. \ref{['eq:shap_perm_approx']}. We consider a single layer with three nodes (numbered 1,2 and 3) and compute the marginal contribution of node 1 in each of the 3! permutations of all the nodes. The bold nodes represent coalitions. A coalition is formed by appending nodes from left to right. The upper row includes the coalitions with node 1, the lower row contains the corresponding coalitions without node 1. The average contribution is then $\varphi_{1}=\frac{45 + 45+15+35+5+5}{3!}=\frac{150}{6}$=25. The percentage illustrates the characteristic function, that is the accuracy of the network containing only coalition nodes. The accuracy of the full network is 90% and with all the nodes removed 10%. By performing similar computations, we can calculate that $\varphi_2=25$, $\varphi_3=30.3$. This indicates that node 3 on average contributes the most to the network, and according to the Shapley Oracle pruning would be the most important node in the network.

Shapley Pruning for Neural Network Compression

TL;DR

Abstract

Shapley Pruning for Neural Network Compression

Authors

TL;DR

Abstract

Table of Contents

Figures (2)