Towards Understanding the Influence of Training Samples on Explanations

André Artelt; Barbara Hammer

Towards Understanding the Influence of Training Samples on Explanations

André Artelt, Barbara Hammer

TL;DR

This work tackles the problem of tracing explanations back to training data by formalizing the notion of influential training samples that shape explanations in Explainable AI. It introduces a Data-SHAP–style gradient Monte Carlo algorithm to identify samples that strongly affect a given explanation, and applies it to two case studies involving counterfactual recourse: (i) the average cost of recourse and (ii) the cost-difference across protected groups. The methodology demonstrates that removing influential samples can reduce recourse costs and fairness disparities with limited degradation of predictive performance, and often performs better than standard Data-SHAP baselines in preserving accuracy. The findings have practical implications for data cleaning, fairness auditing, and trust in explanations, while also suggesting extensions to groups of samples and formal guarantees for the approximations used.

Abstract

Explainable AI (XAI) is widely used to analyze AI systems' decision-making, such as providing counterfactual explanations for recourse. When unexpected explanations occur, users may want to understand the training data properties shaping them. Under the umbrella of data valuation, first approaches have been proposed that estimate the influence of data samples on a given model. This process not only helps determine the data's value, but also offers insights into how individual, potentially noisy, or misleading examples affect a model, which is crucial for interpretable AI. In this work, we apply the concept of data valuation to the significant area of model evaluations, focusing on how individual training samples impact a model's internal reasoning rather than the predictive performance only. Hence, we introduce the novel problem of identifying training samples shaping a given explanation or related quantity, and investigate the particular case of the cost of computational recourse. We propose an algorithm to identify such influential samples and conduct extensive empirical evaluations in two case studies.

Towards Understanding the Influence of Training Samples on Explanations

TL;DR

Abstract

Paper Structure (18 sections, 12 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 12 equations, 2 figures, 1 table, 1 algorithm.

Introduction
Our contributions:
Foundations
Data-Valuation
Counterfactuals for Computational Recourse
Implementation
Influence of Training Samples on Explanations
Quantifying the Influence of Training Samples on Explanations
Reduction to a Game-theoretic Approach
Computational Considerations
Case-Study I: Cost of Recourse
Case-Study II: Difference in the Cost of Recourse
Experiments
Data
Setup
...and 3 more sections

Figures (2)

Figure 1: Case-Study I: Effect of removing training samples that have a high influence on the average cost of recourse Eq. \ref{['eq:avg_cost_recourse']} -- we show (mean & variance over all folds) the effect on the average cost of recourse, as well as on the predictive performance (i.e. F1-score).
Figure 2: Case Study II: Effect of removing training samples that have a high influence on the difference in the cost of recourse Eq. \ref{['eq:diff_cost_recourse']} -- we show the effect on the difference in the cost of recourse, as well as on the predictive performance (i.e. F1-score). Note that we only consider the train-test split with the worst original difference.

Theorems & Definitions (2)

remark thmcounterremark
definition thmcounterdefinition: Influential Training Samples

Towards Understanding the Influence of Training Samples on Explanations

TL;DR

Abstract

Towards Understanding the Influence of Training Samples on Explanations

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (2)