TVE: Learning Meta-attribution for Transferable Vision Explainer

Guanchu Wang; Yu-Neng Chuang; Fan Yang; Mengnan Du; Chia-Yuan Chang; Shaochen Zhong; Zirui Liu; Zhaozhuo Xu; Kaixiong Zhou; Xuanting Cai; Xia Hu

TVE: Learning Meta-attribution for Transferable Vision Explainer

Guanchu Wang, Yu-Neng Chuang, Fan Yang, Mengnan Du, Chia-Yuan Chang, Shaochen Zhong, Zirui Liu, Zhaozhuo Xu, Kaixiong Zhou, Xuanting Cai, Xia Hu

TL;DR

TVE introduces meta-attribution to enable transferable explanations across vision models and downstream tasks. By pre-training a transferable explainer on large-scale data and applying a task-aligned transfer rule, TVE explains downstream models without task-specific data while maintaining fidelity and efficiency. The approach is grounded in a V-information based explanation and is supported by theoretical error bounds. Empirical results across ViT, Swin, and Deit architectures on Cats-vs-dogs, Imagenette, and CIFAR-10 demonstrate competitive fidelity and favorable latency, with strong transferability against baselines.

Abstract

Explainable machine learning significantly improves the transparency of deep neural networks. However, existing work is constrained to explaining the behavior of individual model predictions, and lacks the ability to transfer the explanation across various models and tasks. This limitation results in explaining various tasks being time- and resource-consuming. To address this problem, we introduce a Transferable Vision Explainer (TVE) that can effectively explain various vision models in downstream tasks. Specifically, the transferability of TVE is realized through a pre-training process on large-scale datasets towards learning the meta-attribution. This meta-attribution leverages the versatility of generic backbone encoders to comprehensively encode the attribution knowledge for the input instance, which enables TVE to seamlessly transfer to explain various downstream tasks, without the need for training on task-specific data. Empirical studies involve explaining three different architectures of vision models across three diverse downstream datasets. The experimental results indicate TVE is effective in explaining these tasks without the need for additional training on downstream data.

TVE: Learning Meta-attribution for Transferable Vision Explainer

TL;DR

Abstract

Paper Structure (49 sections, 2 theorems, 14 equations, 12 figures, 5 tables, 1 algorithm)

This paper contains 49 sections, 2 theorems, 14 equations, 12 figures, 5 tables, 1 algorithm.

Introduction
Notations
Target model.
Image Patching.
Model Perturbation.
Feature Attribution.
Feature Attribution can Transfer
Meta-attribution Transfer
$\mathcal{V}$-Information-based Explanation
Definition of Meta-attribution
Transfer to Task-aligned Explanation
Learning Meta-attribution
Explainer Pre-training
Generating Task-aligned Explanation
Theoretical Analysis
...and 34 more sections

Key Result

Theorem 1

Given the classifier ${H}_t(\bullet; \bullet)$ of the downstream task, if the output of classifier ${H}_t(\hat{\textbf{g}}_{k,z}; y)$ and ${H}_t(\hat{\textbf{h}}_{k,z}; y)$ fall within the range of $1-\epsilon \leq \frac{{H}_t(\hat{\textbf{g}}_{k,z}; y)}{{H}_t(\textbf{g}_{k,z}; y)}, \frac{{H}_t(\tex where $\hat{\phi}_{k, y, {z}}$ and $\phi_{k, y, {z}}$ are given by Equation (eq:estimate_generic_at

Figures (12)

Figure 1: Performance of TVE in explaining ViT-B, Swin-B, and Deit-B on the Cats-vs-dogs, Imagenette, and CIFAR-10 datasets. Fidelity$^+$ score refers to the area under Fidelity$^+$-sparsity curve. (b) Illustration of attribution transfer. In this framework, the backbone can be a ViT encoder; and the downstream classifiers can be MLPs. The embedding vector comprehensively encodes the features of input image. Motivated by this, the meta-attribution comprehensively encapsulates the importance of each input patch to each element of the embedding vector. This enables it to seamlessly transfer for explaining various downstream tasks.
Figure 2: $\mathrm{Fidelity}^+$-Sparsity-AUC($\uparrow$) on the Imagenette (a), Cat-vs-dogs (b), and CIFAR-10 (c) datasets. $\mathrm{Fidelity}^-$-Sparsity-AUC($\downarrow$) on the Imagenette (d), Cat-vs-dogs (e), and CIFAR-10 (f) datasets.
Figure 3: Fine-tuning loss versus epoch (a), $\mathrm{Fidelity}^+ \! \uparrow$ versus Sparsity (b), and $\mathrm{Fidelity}^- \! \downarrow$ versus Sparsity (c) on the Imagenette dataset. Fine-tuning loss versus epoch (d), $\mathrm{Fidelity}^+ \uparrow$ versus Sparsity (e), and $\mathrm{Fidelity}^- \downarrow$ versus Sparsity (f) on the cats-vs-dogs dataset.
Figure 4: Fidelity of ablation studies.
Figure 5: Throughput of explaining different architectures.
...and 7 more figures

Theorems & Definitions (5)

Definition 1: Meta-attribution
Definition 2: Attribution Transfer
Theorem 1: Explanation Error Bound
Theorem 1: Explanation Error Bound
proof

TVE: Learning Meta-attribution for Transferable Vision Explainer

TL;DR

Abstract

TVE: Learning Meta-attribution for Transferable Vision Explainer

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (5)