TVE: Learning Meta-attribution for Transferable Vision Explainer
Guanchu Wang, Yu-Neng Chuang, Fan Yang, Mengnan Du, Chia-Yuan Chang, Shaochen Zhong, Zirui Liu, Zhaozhuo Xu, Kaixiong Zhou, Xuanting Cai, Xia Hu
TL;DR
TVE introduces meta-attribution to enable transferable explanations across vision models and downstream tasks. By pre-training a transferable explainer on large-scale data and applying a task-aligned transfer rule, TVE explains downstream models without task-specific data while maintaining fidelity and efficiency. The approach is grounded in a V-information based explanation and is supported by theoretical error bounds. Empirical results across ViT, Swin, and Deit architectures on Cats-vs-dogs, Imagenette, and CIFAR-10 demonstrate competitive fidelity and favorable latency, with strong transferability against baselines.
Abstract
Explainable machine learning significantly improves the transparency of deep neural networks. However, existing work is constrained to explaining the behavior of individual model predictions, and lacks the ability to transfer the explanation across various models and tasks. This limitation results in explaining various tasks being time- and resource-consuming. To address this problem, we introduce a Transferable Vision Explainer (TVE) that can effectively explain various vision models in downstream tasks. Specifically, the transferability of TVE is realized through a pre-training process on large-scale datasets towards learning the meta-attribution. This meta-attribution leverages the versatility of generic backbone encoders to comprehensively encode the attribution knowledge for the input instance, which enables TVE to seamlessly transfer to explain various downstream tasks, without the need for training on task-specific data. Empirical studies involve explaining three different architectures of vision models across three diverse downstream datasets. The experimental results indicate TVE is effective in explaining these tasks without the need for additional training on downstream data.
