Table of Contents
Fetching ...

Shapley Pruning for Neural Network Compression

Kamil Adamczewski, Yawei Li, Luc van Gool

TL;DR

This work reframes neural network pruning as a coalitional game, using the Shapley value to quantify each neuron's average marginal contribution to network performance via the coalition payoff $ u( K)$. It introduces three practical approximations—partial Shapley value, averaging permutations, and weighted least-squares regression—and a new Oracle ranking benchmark to assess ranking quality. Empirical results across Lenet-5, VGG-16, ResNet-56, and ResNet-50 demonstrate that Shapley-based pruning yields state-of-the-art compression, achieving substantial reductions in FLOPs and parameters while maintaining accuracy. The approach offers a principled, scalable framework for channel-wise pruning with strong performance under realistic computational budgets.

Abstract

Neural network pruning is a rich field with a variety of approaches. In this work, we propose to connect the existing pruning concepts such as leave-one-out pruning and oracle pruning and develop them into a more general Shapley value-based framework that targets the compression of convolutional neural networks. To allow for practical applications in utilizing the Shapley value, this work presents the Shapley value approximations, and performs the comparative analysis in terms of cost-benefit utility for the neural network compression. The proposed ranks are evaluated against a new benchmark, Oracle rank, constructed based on oracle sets. The broad experiments show that the proposed normative ranking and its approximations show practical results, obtaining state-of-the-art network compression.

Shapley Pruning for Neural Network Compression

TL;DR

This work reframes neural network pruning as a coalitional game, using the Shapley value to quantify each neuron's average marginal contribution to network performance via the coalition payoff . It introduces three practical approximations—partial Shapley value, averaging permutations, and weighted least-squares regression—and a new Oracle ranking benchmark to assess ranking quality. Empirical results across Lenet-5, VGG-16, ResNet-56, and ResNet-50 demonstrate that Shapley-based pruning yields state-of-the-art compression, achieving substantial reductions in FLOPs and parameters while maintaining accuracy. The approach offers a principled, scalable framework for channel-wise pruning with strong performance under realistic computational budgets.

Abstract

Neural network pruning is a rich field with a variety of approaches. In this work, we propose to connect the existing pruning concepts such as leave-one-out pruning and oracle pruning and develop them into a more general Shapley value-based framework that targets the compression of convolutional neural networks. To allow for practical applications in utilizing the Shapley value, this work presents the Shapley value approximations, and performs the comparative analysis in terms of cost-benefit utility for the neural network compression. The proposed ranks are evaluated against a new benchmark, Oracle rank, constructed based on oracle sets. The broad experiments show that the proposed normative ranking and its approximations show practical results, obtaining state-of-the-art network compression.
Paper Structure (18 sections, 10 equations, 2 figures, 4 tables)

This paper contains 18 sections, 10 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Visual comparison of the heatmaps produced by three different pruned models for Resnet-50 and Imagenet [selvaraju2017grad]. The top-left one is the original model, the top-right one prunes the least important nodes according to the Shapley ranking (what we want), the botton one prunes the most important nodes according to the Shapley ranking (what we do not want). The red color indicates the elements the network focuses on during the classification. Shapley pruning properly in an interpretable way ranks important and unimportant nodes.
  • Figure 2: An example of computing the Shapley value of node 1, $\varphi_1$, in a neural network according to the definition from Eq. \ref{['eq:shap_perm_approx']}. We consider a single layer with three nodes (numbered 1,2 and 3) and compute the marginal contribution of node 1 in each of the 3! permutations of all the nodes. The bold nodes represent coalitions. A coalition is formed by appending nodes from left to right. The upper row includes the coalitions with node 1, the lower row contains the corresponding coalitions without node 1. The average contribution is then $\varphi_{1}=\frac{45 + 45+15+35+5+5}{3!}=\frac{150}{6}$=25. The percentage illustrates the characteristic function, that is the accuracy of the network containing only coalition nodes. The accuracy of the full network is 90% and with all the nodes removed 10%. By performing similar computations, we can calculate that $\varphi_2=25$, $\varphi_3=30.3$. This indicates that node 3 on average contributes the most to the network, and according to the Shapley Oracle pruning would be the most important node in the network.