Table of Contents
Fetching ...

Network Dissection: Quantifying Interpretability of Deep Visual Representations

David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba

TL;DR

This paper tackles the interpretability of deep visual representations by quantifying alignment between individual CNN units and a wide set of semantic concepts using a unified dataset. It introduces Network Dissection, a three-step scoring framework leveraging the Broden dataset to label unit semantics and measure layer interpretability as the number of unique concept detectors. Through extensive experiments across architectures, supervision schemes, and training conditions, it demonstrates that interpretability is axis-dependent and can be degraded by basis rotations or batch normalization, even when discriminative power remains intact. The findings show that deeper networks and scene-focused supervision tend to yield more interpretable units, while widening layers can increase interpretability up to a limit, offering practical guidance for building more transparent CNNs.

Abstract

We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.

Network Dissection: Quantifying Interpretability of Deep Visual Representations

TL;DR

This paper tackles the interpretability of deep visual representations by quantifying alignment between individual CNN units and a wide set of semantic concepts using a unified dataset. It introduces Network Dissection, a three-step scoring framework leveraging the Broden dataset to label unit semantics and measure layer interpretability as the number of unique concept detectors. Through extensive experiments across architectures, supervision schemes, and training conditions, it demonstrates that interpretability is axis-dependent and can be degraded by basis rotations or batch normalization, even when discriminative power remains intact. The findings show that deeper networks and scene-focused supervision tend to yield more interpretable units, while widening layers can increase interpretability up to a limit, offering practical guidance for building more transparent CNNs.

Abstract

We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.

Paper Structure

This paper contains 14 sections, 1 equation, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Unit 13 in zhou2014object (classifying places) detects table lamps. Unit 246 in gonzalez2016semantic (classifying objects) detects bicycle wheels. A unit in vondrick2016generating (self-supervised for generating videos) detects people.
  • Figure 2: Samples from the Broden Dataset. The ground truth for each concept is a pixel-wise dense annotation.
  • Figure 3: Illustration of network dissection for measuring semantic alignment of units in a given CNN. Here one unit of the last convolutional layer of a given CNN is probed by evaluating its performance on 1197 segmentation tasks. Our method can probe any convolutional layer.
  • Figure 4: Interpretability over changes in basis of the representation of AlexNet conv5 trained on Places. The vertical axis shows the number of unique interpretable concepts that match a unit in the representation. The horizontal axis shows $\alpha$, which quantifies the degree of rotation.
  • Figure 5: A comparison of the interpretability of all five convolutional layers of AlexNet, as trained on classification tasks for Places (top) and ImageNet (bottom). At right, three examples of units in each layer are shown with identified semantics. The segmentation generated by each unit is shown on the three Broden images with highest activation. Top-scoring labels are shown above to the left, and human-annotated labels are shown above to the right. Some disagreement can be seen for the dominant judgment of meaning. For example, human annotators mark the first conv4 unit on Places as a 'windows' detector, while the algorithm matches the 'chequered' texture.
  • ...and 8 more figures