Identifying Important Group of Pixels using Interactions
Kosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera
TL;DR
This paper introduces MoXI, a learning-friendly, game-theoretic method for explaining image classifications by identifying groups of pixels whose cooperative interactions strongly influence predicted confidence. By leveraging self-context Shapley values and pairwise interactions within a greedy insertion/deletion framework, MoXI achieves a quadratic computational cost $O(|N|^2)$, avoiding the exponential burden of naive Shapley computation. Extensive experiments on ImageNet with ViT/DeiT/ResNet-18 show MoXI produces sharper insertion/deletion curves, more concise heatmaps, and improved class-discriminative localization compared with Grad-CAM, Attention rollout, and standard Shapley methods. The results demonstrate the method’s practical value for robust, interpretable visual explanations, with the code available at the project repository.
Abstract
To better understand the behavior of image classifiers, it is useful to visualize the contribution of individual pixels to the model prediction. In this study, we propose a method, MoXI ($\textbf{Mo}$del e$\textbf{X}$planation by $\textbf{I}$nteractions), that efficiently and accurately identifies a group of pixels with high prediction confidence. The proposed method employs game-theoretic concepts, Shapley values and interactions, taking into account the effects of individual pixels and the cooperative influence of pixels on model confidence. Theoretical analysis and experiments demonstrate that our method better identifies the pixels that are highly contributing to the model outputs than widely-used visualization by Grad-CAM, Attention rollout, and Shapley value. While prior studies have suffered from the exponential computational cost in the computation of Shapley value and interactions, we show that this can be reduced to quadratic cost for our task. The code is available at https://github.com/KosukeSumiyasu/MoXI.
