Table of Contents
Fetching ...

Sparse, Efficient and Explainable Data Attribution with DualXDA

Galip Ümit Yolcu, Moritz Weckbecker, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin

TL;DR

DualXDA tackles the computational and sparsity shortcomings of Data Attribution by introducing DualDA, a kernel SVM surrogate that yields sparse, faithful global/local attributions, and XDA, which traces these attributions back to input features via Layer-wise Relevance Propagation. The framework achieves massive speedups over Influence Functions and related methods while maintaining attribution quality, and provides novel joint explanations linking influential training samples to meaningful input features. Extensive quantitative benchmarks across MNIST, CIFAR-10, and AwA2 demonstrate strong performance and resource efficiency, complemented by qualitative XDA case studies. Limitations include scope to classification and moderate-scale datasets, with clear directions for extending to regression, multimodal models, and more rigorous XDA evaluation.

Abstract

Data Attribution (DA) is an emerging approach in the field of eXplainable Artificial Intelligence (XAI), aiming to identify influential training datapoints which determine model outputs. It seeks to provide transparency about the model and individual predictions, e.g. for model debugging, identifying data-related causes of suboptimal performance. However, existing DA approaches suffer from prohibitively high computational costs and memory demands when applied to even medium-scale datasets and models, forcing practitioners to resort to approximations that may fail to capture the true inference process of the underlying model. Additionally, current attribution methods exhibit low sparsity, resulting in non-negligible attribution scores across a high number of training examples, hindering the discovery of decisive patterns in the data. In this work, we introduce DualXDA, a framework for sparse, efficient and explainable DA, comprised of two interlinked approaches, Dual Data Attribution (DualDA) and eXplainable Data Attribution (XDA): With DualDA, we propose a novel approach for efficient and effective DA, leveraging Support Vector Machine theory to provide fast and naturally sparse data attributions for AI predictions. In extensive quantitative analyses, we demonstrate that DualDA achieves high attribution quality, excels at solving a series of evaluated downstream tasks, while at the same time improving explanation time by a factor of up to 4,100,000x compared to the original Influence Functions method, and up to 11,000x compared to the method's most efficient approximation from literature to date. We further introduce XDA, a method for enhancing Data Attribution with capabilities from feature attribution methods to explain why training samples are relevant for the prediction of a test sample in terms of impactful features, which we showcase and verify qualitatively in detail.

Sparse, Efficient and Explainable Data Attribution with DualXDA

TL;DR

DualXDA tackles the computational and sparsity shortcomings of Data Attribution by introducing DualDA, a kernel SVM surrogate that yields sparse, faithful global/local attributions, and XDA, which traces these attributions back to input features via Layer-wise Relevance Propagation. The framework achieves massive speedups over Influence Functions and related methods while maintaining attribution quality, and provides novel joint explanations linking influential training samples to meaningful input features. Extensive quantitative benchmarks across MNIST, CIFAR-10, and AwA2 demonstrate strong performance and resource efficiency, complemented by qualitative XDA case studies. Limitations include scope to classification and moderate-scale datasets, with clear directions for extending to regression, multimodal models, and more rigorous XDA evaluation.

Abstract

Data Attribution (DA) is an emerging approach in the field of eXplainable Artificial Intelligence (XAI), aiming to identify influential training datapoints which determine model outputs. It seeks to provide transparency about the model and individual predictions, e.g. for model debugging, identifying data-related causes of suboptimal performance. However, existing DA approaches suffer from prohibitively high computational costs and memory demands when applied to even medium-scale datasets and models, forcing practitioners to resort to approximations that may fail to capture the true inference process of the underlying model. Additionally, current attribution methods exhibit low sparsity, resulting in non-negligible attribution scores across a high number of training examples, hindering the discovery of decisive patterns in the data. In this work, we introduce DualXDA, a framework for sparse, efficient and explainable DA, comprised of two interlinked approaches, Dual Data Attribution (DualDA) and eXplainable Data Attribution (XDA): With DualDA, we propose a novel approach for efficient and effective DA, leveraging Support Vector Machine theory to provide fast and naturally sparse data attributions for AI predictions. In extensive quantitative analyses, we demonstrate that DualDA achieves high attribution quality, excels at solving a series of evaluated downstream tasks, while at the same time improving explanation time by a factor of up to 4,100,000x compared to the original Influence Functions method, and up to 11,000x compared to the method's most efficient approximation from literature to date. We further introduce XDA, a method for enhancing Data Attribution with capabilities from feature attribution methods to explain why training samples are relevant for the prediction of a test sample in terms of impactful features, which we showcase and verify qualitatively in detail.
Paper Structure (54 sections, 1 theorem, 47 equations, 25 figures, 7 tables)

This paper contains 54 sections, 1 theorem, 47 equations, 25 figures, 7 tables.

Key Result

Theorem 3.1

Let $W$ denote the solution to the SVM optimization problem in eq:qp_soft_margin. Denote by $W^i$ the parameters of an SVM trained on the same dataset and a optimization criterion modified w.r.t. to the $i$-th training sample: The contribution of the $i$-th training sample is down-weighted by a factor of $\varepsilon$. Then, for any test sample $x$ the infinitesimal change in $Wf(x;\mathbf{\varth

Figures (25)

  • Figure 1: DualDA efficiently identifies training samples which are influential for both the overall model fit (global attribution) as well as for the prediction for specific test samples (local attribution). $\raisebox{.5pt}{\textcircled{1}}$ Our method assumes models with a nonlinear feature extractor $f$ followed by a fully-connected layer as the classification head $g$. $\raisebox{.5pt}{\textcircled{2}}$ DualDA substitutes the final layer of the original model with a linear SVM. The resulting weight vector $w$ can then be expressed as a linear combination of the final layer latent embeddings of training samples. Note that a binary classification case is visualized for the sake of simplicity and legibility, whereas DualDA employs a multiclass SVM. $\raisebox{.5pt}{\textcircled{3}}$ The global attribution of each training datapoint is quantified by its corresponding scalar coefficient $\lambda_i$ in the linear decomposition of $w$. $\raisebox{.5pt}{\textcircled{4}}$ Moreover, since $w$ is represented as a combination of training feature embeddings, we can decompose the output of the surrogate model for a given test point into a sum of contributions from each training sample. This local attribution (i.e. the contribution of a training point to the prediction for a specific test point) is given by the inner product of the feature embeddings of the training and test samples, scaled by the global influence coefficient of the training sample. $\raisebox{.5pt}{\textcircled{5}}$ To trace these influences back to the input space, our XDA approach employs Layer-wise Relevance Propagation (LRP) on DualDA attributions. The method propagates the attributions from the surrogate model’s output, through the feature extractor, down to the input pixels for both training and test samples. The result is a pair of attribution heatmaps -- one for each training–test pair -- highlighting input regions that contributed positively or negatively to the model’s inference.
  • Figure 2: Average rank across all metrics plotted against the total runtime for calculating the attribution for 2,000 test images. DualDA demonstrates competitive performance with drastically reduced computational time requirements across datasets. The figure showcases the results for nine different methods, averaged over seven different metrics, on three different datasets and models.
  • Figure 3: Evaluation results on the AwA2 dataset. The rank for DualDA with the best-performing hyperparameter $C$ is denoted over the corresponding bar. Note that Mislabeling Detection requires calculating the self-influence of the entire train set (see \ref{['app:metrics']}). For LiSSA, calculating self-attributions for all training points would take roughly a year of runtime and is therefore computationally infeasible. As GradCos is defined as the cosine of the angle between the test and training sample's feature vectors, the self-attribution for GradCos is equal to 1 regardless of the sample.
  • Figure 4: Caching times and cumulative explanation times over 2,000 test samples for all methods, mapped onto a logarithmic scale, as well as sizes of precomputed cache information per method. Note that LiSSA does not require any caching and thus has no caching time or cache size. As calculations are made on GPU, the sparsity level of DualDA has only very minor impact on the runtime.
  • Figure 5: Analysis of cumulative distribution of positive and negative attributions for various DA methods on the AwA2 dataset. The $x$-axis represents the top x% of training samples when sorted by their absolute attribution scores, while the $y$-axis shows what fraction of the total absolute attribution is contained within those top samples. The cumulative curves are obtained by computing individual curves for each test sample and subsequently averaging across the complete test set.
  • ...and 20 more figures

Theorems & Definitions (2)

  • Definition 2.1
  • Theorem 3.1