Table of Contents
Fetching ...

Neuron-based explanations of neural networks sacrifice completeness and interpretability

Nolan Dey, Eric Taylor, Alexander Wong, Bryan Tripp, Graham W. Taylor

TL;DR

The paper investigates explainability of deep nets by comparing neuron-based explanations with PCA-based activation bases. In AlexNet trained on ImageNet, a small set of high-variance principal components captures most activation variance and exerts strong influence on performance, while neuron-based bases require many units to cover similar variance. A human interpretability study shows PC-based dimensions yield more coherent and interpretable visualizations than individual neurons, especially in mid-to-deep layers. The authors argue for population-level, dimension-reduced explanations over neuron-centric embeddings and suggest non-linear extensions and broader architectural analyses for future work.

Abstract

High quality explanations of neural networks (NNs) should exhibit two key properties. Completeness ensures that they accurately reflect a network's function and interpretability makes them understandable to humans. Many existing methods provide explanations of individual neurons within a network. In this work we provide evidence that for AlexNet pretrained on ImageNet, neuron-based explanation methods sacrifice both completeness and interpretability compared to activation principal components. Neurons are a poor basis for AlexNet embeddings because they don't account for the distributed nature of these representations. By examining two quantitative measures of completeness and conducting a user study to measure interpretability, we show the most important principal components provide more complete and interpretable explanations than the most important neurons. Much of the activation variance may be explained by examining relatively few high-variance PCs, as opposed to studying every neuron. These principal components also strongly affect network function, and are significantly more interpretable than neurons. Our findings suggest that explanation methods for networks like AlexNet should avoid using neurons as a basis for embeddings and instead choose a basis, such as principal components, which accounts for the high dimensional and distributed nature of a network's internal representations. Interactive demo and code available at https://ndey96.github.io/neuron-explanations-sacrifice.

Neuron-based explanations of neural networks sacrifice completeness and interpretability

TL;DR

The paper investigates explainability of deep nets by comparing neuron-based explanations with PCA-based activation bases. In AlexNet trained on ImageNet, a small set of high-variance principal components captures most activation variance and exerts strong influence on performance, while neuron-based bases require many units to cover similar variance. A human interpretability study shows PC-based dimensions yield more coherent and interpretable visualizations than individual neurons, especially in mid-to-deep layers. The authors argue for population-level, dimension-reduced explanations over neuron-centric embeddings and suggest non-linear extensions and broader architectural analyses for future work.

Abstract

High quality explanations of neural networks (NNs) should exhibit two key properties. Completeness ensures that they accurately reflect a network's function and interpretability makes them understandable to humans. Many existing methods provide explanations of individual neurons within a network. In this work we provide evidence that for AlexNet pretrained on ImageNet, neuron-based explanation methods sacrifice both completeness and interpretability compared to activation principal components. Neurons are a poor basis for AlexNet embeddings because they don't account for the distributed nature of these representations. By examining two quantitative measures of completeness and conducting a user study to measure interpretability, we show the most important principal components provide more complete and interpretable explanations than the most important neurons. Much of the activation variance may be explained by examining relatively few high-variance PCs, as opposed to studying every neuron. These principal components also strongly affect network function, and are significantly more interpretable than neurons. Our findings suggest that explanation methods for networks like AlexNet should avoid using neurons as a basis for embeddings and instead choose a basis, such as principal components, which accounts for the high dimensional and distributed nature of a network's internal representations. Interactive demo and code available at https://ndey96.github.io/neuron-explanations-sacrifice.

Paper Structure

This paper contains 21 sections, 5 equations, 20 figures.

Figures (20)

  • Figure 1: Overview of our methodology. We sample $\mathbb{R}^d$ activations from a layer (1), then identify basis vectors of the layer's activation space (e.g. neurons or PCs) (2). Finally we visualize points along each basis vector (3) and interpret the visualizations (4).
  • Figure 2: Visualizations of points along the 5 highest variance PCs and 5 highest variance neurons for AlexNet's conv1 layer.
  • Figure 3: Visualizations of points along the 6 highest variance PCs and 5 highest variance neurons for AlexNet's conv2 layer.
  • Figure 4: Visualizations of points along the 5 highest variance PCs and 5 highest variance neurons for AlexNet's fc1 layer.
  • Figure 5: Cumulative sum of explained variance ratio for each AlexNet layer plotted against the number of basis vectors being used. Both PCs and neurons are ordered by descending variance. The number of basis vectors required to explain 80% and 99% variance is annotated.
  • ...and 15 more figures