Table of Contents
Fetching ...

A Bayesian Information-Theoretic Approach to Data Attribution

Dharmesh Tailor, Nicolò Felicioni, Kamil Ciosek

Abstract

Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and coreset selection, showing that our method scales to modern architectures while bridging principled measures with practice.

A Bayesian Information-Theoretic Approach to Data Attribution

Abstract

Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and coreset selection, showing that our method scales to modern architectures while bridging principled measures with practice.

Paper Structure

This paper contains 49 sections, 2 theorems, 66 equations, 9 figures, 1 table.

Key Result

Lemma 1

Fix a query $\mathbf{x}_*$ and subset size $M$, and let $\mathcal{S}_M := \{S \subseteq \mathcal{D} : |S| = M\}$. Then, as $\sigma^2 \to \infty$, uniformly over $S \in \mathcal{S}_M$. In particular, if $\left\lVert\mathbf{k}_{S*}\right\rVert^2$ has a unique maximizer $S^\dagger$ over $\mathcal{S}_M$, then both criteria are maximized by $S^\dagger$ for sufficiently large $\sigma^2$.

Figures (9)

  • Figure 1: Bayesian information-theoretic training data attribution (TDA). For a query input $\mathbf{x}_*$, we quantify the contribution of a training subset $S$ via its information loss: the increase in (posterior) predictive entropy at $\mathbf{x}_*$ when $S$ is withheld from training, attributing credit to examples that resolve epistemic uncertainty rather than label noise. Left: we relate information loss to a submodular information gain relaxation whose leading-order term matches information loss in a high-noise regime, followed by a linear-response variance correction that implements greedy selection with a squared inner-product score, enabling efficient retrieval in a vector database. Right: CIFAR-10 (ResNet-9) examples showing the top-ranked training images under an influence-function estimator versus our information loss criterion.
  • Figure 2: Information efficiency experiments on binary CIFAR-10. Top: selecting subsets via greedy InfoGain and varying the observation noise. Bottom: fixed observation noise level ($\sigma^2=10^3$), using InfoGain and approximate InfoGain.
  • Figure 3: Our Bayesian information-theoretic methods---InfoLoss, InfoGain, and InfoGain (approx)---recover strong attribution signal in identified subsets, as reflected by higher brittleness. We plot the fraction of previously correct test queries that become misclassified after removing the same-label training examples attributed by each method and retraining. Left: Fashion-MNIST (MLP), where our methods dominate and KronInfluence is the closest baseline. Middle: CIFAR-10 (ResNet-9), where our methods lead across budgets with TRAK most competitive. Right: RTE (BERT), where KronInfluence is strongest at smaller budgets, while InfoLoss/InfoGain catch up and are competitive at larger budgets. Error bars represent standard error across repeated runs.
  • Figure 4: Examples of backdoored CIFAR-10 images with triggers. The trigger is a $3{\times}3$ black/white random pattern placed at the bottom-right corner. This causes the model to misclassify the image to the corrupted class.
  • Figure 5: InfoGain and InfoGain (approx) construct substantially better CIFAR-10 coresets than existing TDA baselines which perform worse than random selection. We plot test accuracy after retraining on the selected subset for coreset sizes ranging from $0.2\%$ to $10\%$ of the training set. Error bars show standard error across repeated runs.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Lemma 1
  • Lemma 1