Table of Contents
Fetching ...

SPADE: Sparsity-Guided Debugging for Deep Neural Networks

Arshia Soltani Moakhar, Eugenia Iofinova, Elias Frantar, Dan Alistarh

TL;DR

SPADE addresses interpretability by introducing sample-specific sparsity as a preprocessing step, producing a sparse trace Sparse(M,s) that preserves the original model's outputs on the target input. By solving per-layer sparsity constraints in a one-shot manner on augmented samples, SPADE disambiguates polysemantic neuron activations and enables conventional saliency and visualization methods to yield more faithful, sample-relevant explanations. Across ImageNet, CelebA, and Food-101, SPADE consistently improves saliency-map accuracy (average gains around several percent) and enhances human-understandable neuron visualizations, without retraining or modifying inference. The approach is practical (fast per-sample runtimes), broadly compatible with diverse architectures and interpretability methods, and opens a data-driven path toward more faithful, sample-local model debugging and explanation.

Abstract

It is known that sparsity can improve interpretability for deep neural networks. However, existing methods in the area either require networks that are pre-trained with sparsity constraints, or impose sparsity after the fact, altering the network's general behavior. In this paper, we demonstrate, for the first time, that sparsity can instead be incorporated into the interpretation process itself, as a sample-specific preprocessing step. Unlike previous work, this approach, which we call SPADE, does not place constraints on the trained model and does not affect its behavior during inference on the sample. Given a trained model and a target sample, SPADE uses sample-targeted pruning to provide a "trace" of the network's execution on the sample, reducing the network to the most important connections prior to computing an interpretation. We demonstrate that preprocessing with SPADE significantly increases the accuracy of image saliency maps across several interpretability methods. Additionally, SPADE improves the usefulness of neuron visualizations, aiding humans in reasoning about network behavior. Our code is available at https://github.com/IST-DASLab/SPADE.

SPADE: Sparsity-Guided Debugging for Deep Neural Networks

TL;DR

SPADE addresses interpretability by introducing sample-specific sparsity as a preprocessing step, producing a sparse trace Sparse(M,s) that preserves the original model's outputs on the target input. By solving per-layer sparsity constraints in a one-shot manner on augmented samples, SPADE disambiguates polysemantic neuron activations and enables conventional saliency and visualization methods to yield more faithful, sample-relevant explanations. Across ImageNet, CelebA, and Food-101, SPADE consistently improves saliency-map accuracy (average gains around several percent) and enhances human-understandable neuron visualizations, without retraining or modifying inference. The approach is practical (fast per-sample runtimes), broadly compatible with diverse architectures and interpretability methods, and opens a data-driven path toward more faithful, sample-local model debugging and explanation.

Abstract

It is known that sparsity can improve interpretability for deep neural networks. However, existing methods in the area either require networks that are pre-trained with sparsity constraints, or impose sparsity after the fact, altering the network's general behavior. In this paper, we demonstrate, for the first time, that sparsity can instead be incorporated into the interpretation process itself, as a sample-specific preprocessing step. Unlike previous work, this approach, which we call SPADE, does not place constraints on the trained model and does not affect its behavior during inference on the sample. Given a trained model and a target sample, SPADE uses sample-targeted pruning to provide a "trace" of the network's execution on the sample, reducing the network to the most important connections prior to computing an interpretation. We demonstrate that preprocessing with SPADE significantly increases the accuracy of image saliency maps across several interpretability methods. Additionally, SPADE improves the usefulness of neuron visualizations, aiding humans in reasoning about network behavior. Our code is available at https://github.com/IST-DASLab/SPADE.
Paper Structure (49 sections, 1 equation, 16 figures, 24 tables, 1 algorithm)

This paper contains 49 sections, 1 equation, 16 figures, 24 tables, 1 algorithm.

Figures (16)

  • Figure 1: Given an input image and model, SPADE prunes the model using image augmentations. The resulting trace (subnetwork) can be used with existing interpretability methods to increase their usefulness and accuracy.
  • Figure 2: SPADE disambiguates feature visualizations and improves the faithfulness of saliency maps. (Left) The "Lighter, igniter" class neuron visualization does not give useful clues for why the Matchstick and Spotlight images were incorrectly classified into that class. The visualizations obtained with SPADE identify a matchstick head pattern in the first case and a flame pattern in the second case, suggesting that these may be spurious features for the Lighter class. (Right) A model implanted with Trojan patches leads to a Fox image being misclassified as a Goose. In this case, we are confident that the heart emoji was entirely responsible for the misclassification - yet, the saliency map without SPADE incorrectly assigns large saliency scores to large parts of the fox image. Conversely, the saliency map obtained with SPADE correctly identifies the emoji pixels. Best viewed in color. Further examples are available in Appendix \ref{['appendix:examples']}.
  • Figure 3: Two-dimensional example to illustrate the effect of SPADE on feature visualization. The feature visualizations (images generated by olah2017feature) are shown with green points, where blue and orange points are positive and negative samples. The SPADE Scenario 1 shows the feature visualizations obtained when the red sample is drawn from the larger positive region. Scenario 2 shows the visualizations obtained when the red sample is drawn from the smaller region.
  • Figure F.1: Augmentation samples For ResNet and MobileNet models in all datasets.
  • Figure F.2: Augmentation samples For ConvNext model
  • ...and 11 more figures