Table of Contents
Fetching ...

Pulling Back the Curtain on Deep Networks

Maciej Satkiewicz, Roberto Corizzo, Marcin Pietroń

TL;DR

Semantic Pullbacks address the instability and perceptual misalignment of gradient-based explanations by introducing a locally averaged adjoint transport through softened layer-wise operators. The method computes a Soft Pullback, with a Double Pullback for attention and a Pullback Ascent to produce localized, class-specific perturbations without altering forward model behavior. Across CNNs and Vision Transformers on Imagenette, SP consistently improves faithfulness metrics (e.g., Infidelity, FaithCorr) and robustness while enabling meaningful counterfactuals, outperforming standard attribution baselines. This approach offers a practical, architecture-agnostic tool for interpretable neural computations and suggests a path-centric view of feature flow in deep networks, with potential extensions to other modalities and future work on biases and adaptive hyperparameters.

Abstract

Post-hoc explainability methods typically associate each output score of a deep neural network with an input-space direction, most commonly instantiated as the gradient and visualized as a saliency map. However, these approaches often yield explanations that are noisy, lack perceptual alignment and, thus, offer limited interpretability. While many explanation methods attempt to address this issue via modified backward rules or additional heuristics, such approaches are often difficult to justify theoretically and frequently fail basic sanity checks. We introduce Semantic Pullbacks (SP), a faithful and effective post-hoc explanation method for deep neural networks. Semantic Pullbacks address the limitations above by isolating the network's effective linear action via a principled pullback formulation and refining it to recover coherent local structures learned by the target neuron. As a result, SP produces perceptually aligned, class-conditional explanations that highlight meaningful features, support compelling counterfactual perturbations, and admit a clear theoretical motivation. Across standard faithfulness benchmarks, Semantic Pullbacks significantly outperform established attribution methods on both classical convolutional architectures (ResNet50, VGG) and transformer-based models (PVT), while remaining general and computationally efficient. Our method can be easily plugged into existing deep learning pipelines and extended to other modalities.

Pulling Back the Curtain on Deep Networks

TL;DR

Semantic Pullbacks address the instability and perceptual misalignment of gradient-based explanations by introducing a locally averaged adjoint transport through softened layer-wise operators. The method computes a Soft Pullback, with a Double Pullback for attention and a Pullback Ascent to produce localized, class-specific perturbations without altering forward model behavior. Across CNNs and Vision Transformers on Imagenette, SP consistently improves faithfulness metrics (e.g., Infidelity, FaithCorr) and robustness while enabling meaningful counterfactuals, outperforming standard attribution baselines. This approach offers a practical, architecture-agnostic tool for interpretable neural computations and suggests a path-centric view of feature flow in deep networks, with potential extensions to other modalities and future work on biases and adaptive hyperparameters.

Abstract

Post-hoc explainability methods typically associate each output score of a deep neural network with an input-space direction, most commonly instantiated as the gradient and visualized as a saliency map. However, these approaches often yield explanations that are noisy, lack perceptual alignment and, thus, offer limited interpretability. While many explanation methods attempt to address this issue via modified backward rules or additional heuristics, such approaches are often difficult to justify theoretically and frequently fail basic sanity checks. We introduce Semantic Pullbacks (SP), a faithful and effective post-hoc explanation method for deep neural networks. Semantic Pullbacks address the limitations above by isolating the network's effective linear action via a principled pullback formulation and refining it to recover coherent local structures learned by the target neuron. As a result, SP produces perceptually aligned, class-conditional explanations that highlight meaningful features, support compelling counterfactual perturbations, and admit a clear theoretical motivation. Across standard faithfulness benchmarks, Semantic Pullbacks significantly outperform established attribution methods on both classical convolutional architectures (ResNet50, VGG) and transformer-based models (PVT), while remaining general and computationally efficient. Our method can be easily plugged into existing deep learning pipelines and extended to other modalities.

Paper Structure

This paper contains 70 sections, 44 equations, 15 figures, 4 tables, 1 algorithm.

Figures (15)

  • Figure 1: Semantic Pullback approximates a coarser local structure of the decision boundary than standard pullbacks/gradients. Arguably, this coarser structure reflects the network’s learned representations more faithfully. This resonates with the significantly better results of SP in faithfulness scores, especially in Infidelity, which measures the effect of large perturbations (\ref{['table:resnet50_results']}).
  • Figure 2: We compute Semantic Pullbacks toward the logits of the four visible classes and visualize the corresponding heatmaps. The most salient pixels consistently cover the correct object regions.
  • Figure 3: Comparison of Explainers for ResNet50. Qualitatively, our method is most similar to GuidedGradCam, but it significantly outperforms the latter both quantitatively and in the ability to produce meaningful, target-specific perturbations, cf. \ref{['sec:grad_cam_comparison']}.
  • Figure 4: Counterfactual perturbations for ResNet50 using Pullback Ascent ($\alpha = 20, K=5$). Despite small number of steps, one can clearly distinguish human-aligned, input- and target-specific features that appear in reasonable locations. Best viewed digitally.
  • Figure 5: Counterfactual perturbations for ResNet50 using Pullback Ascent ($\alpha = 20, K=10$). The method appears to enhance weak target-specific features that arguably were already present in the image, just interpreted differently, e.g. the boy's nose is transformed to the church steeple. Top: Pullback Ascent toward each class. Bottom: Pullback Ascent added to the source image.
  • ...and 10 more figures