Table of Contents
Fetching ...

Attribution for Enhanced Explanation with Transferable Adversarial eXploration

Zhiyu Zhu, Jiayu Zhang, Zhibo Jin, Huaming Chen, Jianlong Zhou, Fang Chen

TL;DR

This work tackles the interpretability of deep neural networks by boosting attribution quality through transferable adversarial exploration. It introduces AttEXplore++, an optimized framework that integrates ten transferable attack methods to generate informative baselines and gradient signals for attribution across CNNs and Vision Transformers. Empirical results on ImageNet show consistent gains in insertion and deletion metrics, with AttEXplore++ outperforming AttEXplore by 7.57% on average and surpassing other methods by 32.62% on average, while demonstrating robustness to randomness and parameter variations. The approach offers a practical path toward more reliable explanations in real-world deployments and provides open-source tooling for researchers and developers.

Abstract

The interpretability of deep neural networks is crucial for understanding model decisions in various applications, including computer vision. AttEXplore++, an advanced framework built upon AttEXplore, enhances attribution by incorporating transferable adversarial attack methods such as MIG and GRA, significantly improving the accuracy and robustness of model explanations. We conduct extensive experiments on five models, including CNNs (Inception-v3, ResNet-50, VGG16) and vision transformers (MaxViT-T, ViT-B/16), using the ImageNet dataset. Our method achieves an average performance improvement of 7.57\% over AttEXplore and 32.62\% compared to other state-of-the-art interpretability algorithms. Using insertion and deletion scores as evaluation metrics, we show that adversarial transferability plays a vital role in enhancing attribution results. Furthermore, we explore the impact of randomness, perturbation rate, noise amplitude, and diversity probability on attribution performance, demonstrating that AttEXplore++ provides more stable and reliable explanations across various models. We release our code at: https://anonymous.4open.science/r/ATTEXPLOREP-8435/

Attribution for Enhanced Explanation with Transferable Adversarial eXploration

TL;DR

This work tackles the interpretability of deep neural networks by boosting attribution quality through transferable adversarial exploration. It introduces AttEXplore++, an optimized framework that integrates ten transferable attack methods to generate informative baselines and gradient signals for attribution across CNNs and Vision Transformers. Empirical results on ImageNet show consistent gains in insertion and deletion metrics, with AttEXplore++ outperforming AttEXplore by 7.57% on average and surpassing other methods by 32.62% on average, while demonstrating robustness to randomness and parameter variations. The approach offers a practical path toward more reliable explanations in real-world deployments and provides open-source tooling for researchers and developers.

Abstract

The interpretability of deep neural networks is crucial for understanding model decisions in various applications, including computer vision. AttEXplore++, an advanced framework built upon AttEXplore, enhances attribution by incorporating transferable adversarial attack methods such as MIG and GRA, significantly improving the accuracy and robustness of model explanations. We conduct extensive experiments on five models, including CNNs (Inception-v3, ResNet-50, VGG16) and vision transformers (MaxViT-T, ViT-B/16), using the ImageNet dataset. Our method achieves an average performance improvement of 7.57\% over AttEXplore and 32.62\% compared to other state-of-the-art interpretability algorithms. Using insertion and deletion scores as evaluation metrics, we show that adversarial transferability plays a vital role in enhancing attribution results. Furthermore, we explore the impact of randomness, perturbation rate, noise amplitude, and diversity probability on attribution performance, demonstrating that AttEXplore++ provides more stable and reliable explanations across various models. We release our code at: https://anonymous.4open.science/r/ATTEXPLOREP-8435/
Paper Structure (26 sections, 9 equations, 6 figures, 3 tables)

This paper contains 26 sections, 9 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Flowchart of AttEXplore+ Framework
  • Figure 2: Non-linear Attribution Path
  • Figure 3: Evaluation of the impact of different random seeds on the insertion and deletion scores of different adversarial attack methods in AttEXplore+. The results show that randomness has a limited impact on attribution performance, exhibiting high stability.
  • Figure 4: The impact of diversity probability $DP$ on the insertion and deletion scores of AttEXplore and its variants across different models. Insertion and deletion scores are represented by blue and orange, respectively.
  • Figure 5: The impact of perturbation rate $\epsilon$ on the insertion and deletion scores of AttEXplore and its variants across different models. Insertion and deletion scores are represented by blue and orange, respectively.
  • ...and 1 more figures