Perturbation on Feature Coalition: Towards Interpretable Deep Neural Networks

Xuran Hu; Mingzhe Zhu; Zhenpeng Feng; Miloš Daković; Ljubiša Stanković

Perturbation on Feature Coalition: Towards Interpretable Deep Neural Networks

Xuran Hu, Mingzhe Zhu, Zhenpeng Feng, Miloš Daković, Ljubiša Stanković

TL;DR

This work tackles the interpretability of deep neural networks by addressing the limitations of traditional perturbation-based explanations that overlook feature interdependencies. It introduces a coalition-guided perturbation framework built on an unsupervised, network-centric correlated feature extraction, followed by a regional consistency loss to stabilize explanations. The method yields coalitions of correlated features and leverages them to produce saliency maps via a total loss $L = L_r + \mu L_{conf} + v L_c$, balancing perturbation strength, misclassification pressure, and regional coherence. Empirical results on ImageNet-1k with a pretrained VGG16 show improved localization of decision-relevant regions, higher confidence retention under pixel ablations, and clear ablations demonstrating the importance of the consistency loss. This approach advances interpretable DNNs by explicitly capturing feature dependencies and enforcing coalition-consistent explanations with practical impact for high-stakes applications.

Abstract

The inherent "black box" nature of deep neural networks (DNNs) compromises their transparency and reliability. Recently, explainable AI (XAI) has garnered increasing attention from researchers. Several perturbation-based interpretations have emerged. However, these methods often fail to adequately consider feature dependencies. To solve this problem, we introduce a perturbation-based interpretation guided by feature coalitions, which leverages deep information of network to extract correlated features. Then, we proposed a carefully-designed consistency loss to guide network interpretation. Both quantitative and qualitative experiments are conducted to validate the effectiveness of our proposed method. Code is available at github.com/Teriri1999/Perturebation-on-Feature-Coalition.

Perturbation on Feature Coalition: Towards Interpretable Deep Neural Networks

TL;DR

, balancing perturbation strength, misclassification pressure, and regional coherence. Empirical results on ImageNet-1k with a pretrained VGG16 show improved localization of decision-relevant regions, higher confidence retention under pixel ablations, and clear ablations demonstrating the importance of the consistency loss. This approach advances interpretable DNNs by explicitly capturing feature dependencies and enforcing coalition-consistent explanations with practical impact for high-stakes applications.

Abstract

Paper Structure (10 sections, 11 equations, 4 figures, 2 tables)

This paper contains 10 sections, 11 equations, 4 figures, 2 tables.

Introduction
Methodology
Correlated Feature Extraction
Perturbation on Feature Coalition
Experimental Results
Experiment Setup
Qualitative Comparison
Quantitative Comparison
Ablation Study
Conclusion

Figures (4)

Figure 1: Implementation of the proposed method. Upper part: extraction of correlated feature coalitions; Lower part: coalition-guided perturbation interpretation.
Figure 2: Comparison of interpretation methods on ImageNet-1k. Column from left to right: input sample, Grad-CAMselvaraju2017grad, Score CAMwang2020score, LRPbach2015pixel, IGsundararajan2017axiomatic, Occlusion zeiler2014visualizing, our proposed method, and extracted feature coalition.
Figure 3: Insertion curves of interpretation methods. Our proposed method only requires minimal pixel information to achieve accurate classification.
Figure 4: Ablation study. The first row: input samples; The second row: saliency maps without consistency loss; The third row: saliency maps with consistency loss; The fourth row: correlated feature coalitions.

Perturbation on Feature Coalition: Towards Interpretable Deep Neural Networks

TL;DR

Abstract

Perturbation on Feature Coalition: Towards Interpretable Deep Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)