A Super-pixel-based Approach to the Stable Interpretation of Neural Networks
Shizhan Gong, Jingwei Zhang, Qi Dou, Farzan Farnia
TL;DR
This work tackles the instability of gradient-based saliency maps caused by stochastic training. It introduces a semantically informed pixel grouping via super-pixels, enabling a grouped gradient approach that reduces variance and improves generalization of explanations. The authors provide theoretical stability guarantees and demonstrate, on CIFAR-10 and ImageNet, that super-pixel saliency maps offer higher stability, better generalization (MeGe), and enhanced interpretability (ROAR/ROAD) with only modest fidelity trade-offs. The method remains computationally efficient and complements existing gradient-based approaches like SmoothGrad and Grad-CAM, with potential applicability beyond image data.
Abstract
Saliency maps are widely used in the computer vision community for interpreting neural network classifiers. However, due to the randomness of training samples and optimization algorithms, the resulting saliency maps suffer from a significant level of stochasticity, making it difficult for domain experts to capture the intrinsic factors that influence the neural network's decision. In this work, we propose a novel pixel partitioning strategy to boost the stability and generalizability of gradient-based saliency maps. Through both theoretical analysis and numerical experiments, we demonstrate that the grouping of pixels reduces the variance of the saliency map and improves the generalization behavior of the interpretation method. Furthermore, we propose a sensible grouping strategy based on super-pixels which cluster pixels into groups that align well with the semantic meaning of the images. We perform several numerical experiments on CIFAR-10 and ImageNet. Our empirical results suggest that the super-pixel-based interpretation maps consistently improve the stability and quality over the pixel-based saliency maps.
