Attribution for Enhanced Explanation with Transferable Adversarial eXploration
Zhiyu Zhu, Jiayu Zhang, Zhibo Jin, Huaming Chen, Jianlong Zhou, Fang Chen
TL;DR
This work tackles the interpretability of deep neural networks by boosting attribution quality through transferable adversarial exploration. It introduces AttEXplore++, an optimized framework that integrates ten transferable attack methods to generate informative baselines and gradient signals for attribution across CNNs and Vision Transformers. Empirical results on ImageNet show consistent gains in insertion and deletion metrics, with AttEXplore++ outperforming AttEXplore by 7.57% on average and surpassing other methods by 32.62% on average, while demonstrating robustness to randomness and parameter variations. The approach offers a practical path toward more reliable explanations in real-world deployments and provides open-source tooling for researchers and developers.
Abstract
The interpretability of deep neural networks is crucial for understanding model decisions in various applications, including computer vision. AttEXplore++, an advanced framework built upon AttEXplore, enhances attribution by incorporating transferable adversarial attack methods such as MIG and GRA, significantly improving the accuracy and robustness of model explanations. We conduct extensive experiments on five models, including CNNs (Inception-v3, ResNet-50, VGG16) and vision transformers (MaxViT-T, ViT-B/16), using the ImageNet dataset. Our method achieves an average performance improvement of 7.57\% over AttEXplore and 32.62\% compared to other state-of-the-art interpretability algorithms. Using insertion and deletion scores as evaluation metrics, we show that adversarial transferability plays a vital role in enhancing attribution results. Furthermore, we explore the impact of randomness, perturbation rate, noise amplitude, and diversity probability on attribution performance, demonstrating that AttEXplore++ provides more stable and reliable explanations across various models. We release our code at: https://anonymous.4open.science/r/ATTEXPLOREP-8435/
