Table of Contents
Fetching ...

Neuron Shapley: Discovering the Responsible Neurons

Amirata Ghorbani, James Zou

TL;DR

Neuron Shapley provides a principled way to quantify individual neuron contributions by accounting for interactions via Shapley values. A truncated multi-armed bandit (TMAB-Shapley) estimates these values efficiently in large CNNs, enabling analysis of tens of thousands of filters. Key findings show that a tiny subset of filters dominantly controls accuracy, fairness, and robustness, enabling fast post-training repairs by zeroing culprits without retraining. The framework offers a rigorous, transferable approach for interpretation and repair, with acknowledged computational costs and scope for iterative, contribution-guided retraining.

Abstract

We develop Neuron Shapley as a new framework to quantify the contribution of individual neurons to the prediction and performance of a deep network. By accounting for interactions across neurons, Neuron Shapley is more effective in identifying important filters compared to common approaches based on activation patterns. Interestingly, removing just 30 filters with the highest Shapley scores effectively destroys the prediction accuracy of Inception-v3 on ImageNet. Visualization of these few critical filters provides insights into how the network functions. Neuron Shapley is a flexible framework and can be applied to identify responsible neurons in many tasks. We illustrate additional applications of identifying filters that are responsible for biased prediction in facial recognition and filters that are vulnerable to adversarial attacks. Removing these filters is a quick way to repair models. Enabling all these applications is a new multi-arm bandit algorithm that we developed to efficiently estimate Neuron Shapley values.

Neuron Shapley: Discovering the Responsible Neurons

TL;DR

Neuron Shapley provides a principled way to quantify individual neuron contributions by accounting for interactions via Shapley values. A truncated multi-armed bandit (TMAB-Shapley) estimates these values efficiently in large CNNs, enabling analysis of tens of thousands of filters. Key findings show that a tiny subset of filters dominantly controls accuracy, fairness, and robustness, enabling fast post-training repairs by zeroing culprits without retraining. The framework offers a rigorous, transferable approach for interpretation and repair, with acknowledged computational costs and scope for iterative, contribution-guided retraining.

Abstract

We develop Neuron Shapley as a new framework to quantify the contribution of individual neurons to the prediction and performance of a deep network. By accounting for interactions across neurons, Neuron Shapley is more effective in identifying important filters compared to common approaches based on activation patterns. Interestingly, removing just 30 filters with the highest Shapley scores effectively destroys the prediction accuracy of Inception-v3 on ImageNet. Visualization of these few critical filters provides insights into how the network functions. Neuron Shapley is a flexible framework and can be applied to identify responsible neurons in many tasks. We illustrate additional applications of identifying filters that are responsible for biased prediction in facial recognition and filters that are vulnerable to adversarial attacks. Removing these filters is a quick way to repair models. Enabling all these applications is a new multi-arm bandit algorithm that we developed to efficiently estimate Neuron Shapley values.

Paper Structure

This paper contains 19 sections, 13 equations, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: Visualizing filters critical for overall accuracy We visualize the highest Shapley value filters for a select few Inception-v3 blocks. For each filter, we show 5 example images that activate that neuron most positively or most negatively. Additionally, we optimize a random input to highly activate (positively or negatively) the selected neuron. These filters can have meaningful interpretations, which we write on the left. Earlier layer filters extract simple features like color or pattern. As we go deeper, filters capture sophisticated features like crowdedness or how much color is in the image. On the bottom, we show how many of the top-$100$ contributing filters appear in each layer.
  • Figure 2: Class-specific critical neurons (a) Removing filters with the highest class-specific Shapley values (blue dash) reduce the class prediction accuracy more effectively than removing filters identified by other approaches. We select four representative classes to show. (b) We visualize two critical filters for each class by showing the top 5 most positively activating images along with the deep dream visualization of the filter. (c) Class-specific filters are more common in the deeper layers.
  • Figure 3: (a) The y-axis shows the number of filters that required the given number of samples on x-axis in each algorithms. Using Alg. \ref{['alg:TMAB-shapley']}, most filters require around $10$ times fewer samples compared to MC-Shapley. (b) After computing each filter's contribution to the model's fair perfromance, we remove the ones with the most negative contribution. The model improves especially for black females (BF). The four populations are white female (WF), black female (BF), white male (WM) and black male (BM). (c) We compute each filter's contribution to the adversary's success rate. After removing most contributing neurons, the adversary is much less successful (black), and the model becomes able to detect a large portion of adversarial perturbed images as their true class (red).
  • Figure 4: Filter dropout effect The Neuron Shapley results for two Squeezenet models trained on the celeb-A dataset, one trained with filter dropout and one without. (a) The histogram of values for the two models. (b) Removing the 30 highest value neurons shows that the dropout-trained model is more robust. (c) Removing filters with the least values shows that the dropout trained model is robust to the removal of almost half of the filters. The celebA test set has a 40-60 class imbalance; therefore we see the sharp drop from $60\%$ accuracy to $40\%$ accuracy.
  • Figure 5: Truncation The figure shows the performance of the two models used throughout this paper as filters are removed randomly (100 removal trajectories). It can be seen that for Inception-v3 mode, removing around $10\%$ of filters will break performance. The same is true for Squeezenet by removing around $20\%$ of filters.
  • ...and 3 more figures

Theorems & Definitions (1)

  • proof