Pulling Back the Curtain on Deep Networks
Maciej Satkiewicz, Roberto Corizzo, Marcin Pietroń
TL;DR
Semantic Pullbacks address the instability and perceptual misalignment of gradient-based explanations by introducing a locally averaged adjoint transport through softened layer-wise operators. The method computes a Soft Pullback, with a Double Pullback for attention and a Pullback Ascent to produce localized, class-specific perturbations without altering forward model behavior. Across CNNs and Vision Transformers on Imagenette, SP consistently improves faithfulness metrics (e.g., Infidelity, FaithCorr) and robustness while enabling meaningful counterfactuals, outperforming standard attribution baselines. This approach offers a practical, architecture-agnostic tool for interpretable neural computations and suggests a path-centric view of feature flow in deep networks, with potential extensions to other modalities and future work on biases and adaptive hyperparameters.
Abstract
Post-hoc explainability methods typically associate each output score of a deep neural network with an input-space direction, most commonly instantiated as the gradient and visualized as a saliency map. However, these approaches often yield explanations that are noisy, lack perceptual alignment and, thus, offer limited interpretability. While many explanation methods attempt to address this issue via modified backward rules or additional heuristics, such approaches are often difficult to justify theoretically and frequently fail basic sanity checks. We introduce Semantic Pullbacks (SP), a faithful and effective post-hoc explanation method for deep neural networks. Semantic Pullbacks address the limitations above by isolating the network's effective linear action via a principled pullback formulation and refining it to recover coherent local structures learned by the target neuron. As a result, SP produces perceptually aligned, class-conditional explanations that highlight meaningful features, support compelling counterfactual perturbations, and admit a clear theoretical motivation. Across standard faithfulness benchmarks, Semantic Pullbacks significantly outperform established attribution methods on both classical convolutional architectures (ResNet50, VGG) and transformer-based models (PVT), while remaining general and computationally efficient. Our method can be easily plugged into existing deep learning pipelines and extended to other modalities.
