Attribution for Enhanced Explanation with Transferable Adversarial eXploration

Zhiyu Zhu; Jiayu Zhang; Zhibo Jin; Huaming Chen; Jianlong Zhou; Fang Chen

Attribution for Enhanced Explanation with Transferable Adversarial eXploration

Zhiyu Zhu, Jiayu Zhang, Zhibo Jin, Huaming Chen, Jianlong Zhou, Fang Chen

TL;DR

This work tackles the interpretability of deep neural networks by boosting attribution quality through transferable adversarial exploration. It introduces AttEXplore++, an optimized framework that integrates ten transferable attack methods to generate informative baselines and gradient signals for attribution across CNNs and Vision Transformers. Empirical results on ImageNet show consistent gains in insertion and deletion metrics, with AttEXplore++ outperforming AttEXplore by 7.57% on average and surpassing other methods by 32.62% on average, while demonstrating robustness to randomness and parameter variations. The approach offers a practical path toward more reliable explanations in real-world deployments and provides open-source tooling for researchers and developers.

Abstract

The interpretability of deep neural networks is crucial for understanding model decisions in various applications, including computer vision. AttEXplore++, an advanced framework built upon AttEXplore, enhances attribution by incorporating transferable adversarial attack methods such as MIG and GRA, significantly improving the accuracy and robustness of model explanations. We conduct extensive experiments on five models, including CNNs (Inception-v3, ResNet-50, VGG16) and vision transformers (MaxViT-T, ViT-B/16), using the ImageNet dataset. Our method achieves an average performance improvement of 7.57\% over AttEXplore and 32.62\% compared to other state-of-the-art interpretability algorithms. Using insertion and deletion scores as evaluation metrics, we show that adversarial transferability plays a vital role in enhancing attribution results. Furthermore, we explore the impact of randomness, perturbation rate, noise amplitude, and diversity probability on attribution performance, demonstrating that AttEXplore++ provides more stable and reliable explanations across various models. We release our code at: https://anonymous.4open.science/r/ATTEXPLOREP-8435/

Attribution for Enhanced Explanation with Transferable Adversarial eXploration

TL;DR

Abstract

Paper Structure (26 sections, 9 equations, 6 figures, 3 tables)

This paper contains 26 sections, 9 equations, 6 figures, 3 tables.

Introduction
Related Works
Local Approximation Methods
Gradient-Based Attribution Methods
Adversarial Example-Based Attribution Methods
Adversarial Attacks
Preliminaries
Problem Definition
Integrated Gradients (IG)
The Relationship between Adversarial Attacks and Attribution
Exploring the Impact of Adversarial Transferability on Attribution Performance
AttEXplore Review
Different Ways to Obtain Gradient Information
The Impact of Randomness
Experiments
...and 11 more sections

Figures (6)

Figure 1: Flowchart of AttEXplore+ Framework
Figure 2: Non-linear Attribution Path
Figure 3: Evaluation of the impact of different random seeds on the insertion and deletion scores of different adversarial attack methods in AttEXplore+. The results show that randomness has a limited impact on attribution performance, exhibiting high stability.
Figure 4: The impact of diversity probability $DP$ on the insertion and deletion scores of AttEXplore and its variants across different models. Insertion and deletion scores are represented by blue and orange, respectively.
Figure 5: The impact of perturbation rate $\epsilon$ on the insertion and deletion scores of AttEXplore and its variants across different models. Insertion and deletion scores are represented by blue and orange, respectively.
...and 1 more figures

Attribution for Enhanced Explanation with Transferable Adversarial eXploration

TL;DR

Abstract

Attribution for Enhanced Explanation with Transferable Adversarial eXploration

Authors

TL;DR

Abstract

Table of Contents

Figures (6)