Table of Contents
Fetching ...

MAGIC: Near-Optimal Data Attribution for Deep Learning

Andrew Ilyas, Logan Engstrom

TL;DR

MAGIC addresses predictive data attribution for non-convex deep learning by introducing a single-model attribution framework and a method to compute the exact influence of training data on a specific model. It leverages Replay to obtain the exact metagradient of the model output with respect to data weights, enabling near-optimal estimation of how adding or removing data changes predictions in large-scale iterative, smooth learners. Across vision and language benchmarks, MAGIC achieves near-perfect alignment with ground-truth changes (LDS ≈ 1.0 for small data drops), significantly outperforming baselines. The approach enables practical operations such as targeted data deletion and model debugging with high fidelity, albeit with higher computational cost that scales with the number of test samples; future work aims to improve efficiency and extend to broader scenarios.

Abstract

The goal of predictive data attribution is to estimate how adding or removing a given set of training datapoints will affect model predictions. In convex settings, this goal is straightforward (i.e., via the infinitesimal jackknife). In large-scale (non-convex) settings, however, existing methods are far less successful -- current methods' estimates often only weakly correlate with ground truth. In this work, we present a new data attribution method (MAGIC) that combines classical methods and recent advances in metadifferentiation to (nearly) optimally estimate the effect of adding or removing training data on model predictions.

MAGIC: Near-Optimal Data Attribution for Deep Learning

TL;DR

MAGIC addresses predictive data attribution for non-convex deep learning by introducing a single-model attribution framework and a method to compute the exact influence of training data on a specific model. It leverages Replay to obtain the exact metagradient of the model output with respect to data weights, enabling near-optimal estimation of how adding or removing data changes predictions in large-scale iterative, smooth learners. Across vision and language benchmarks, MAGIC achieves near-perfect alignment with ground-truth changes (LDS ≈ 1.0 for small data drops), significantly outperforming baselines. The approach enables practical operations such as targeted data deletion and model debugging with high fidelity, albeit with higher computational cost that scales with the number of test samples; future work aims to improve efficiency and extend to broader scenarios.

Abstract

The goal of predictive data attribution is to estimate how adding or removing a given set of training datapoints will affect model predictions. In convex settings, this goal is straightforward (i.e., via the infinitesimal jackknife). In large-scale (non-convex) settings, however, existing methods are far less successful -- current methods' estimates often only weakly correlate with ground truth. In this work, we present a new data attribution method (MAGIC) that combines classical methods and recent advances in metadifferentiation to (nearly) optimally estimate the effect of adding or removing training data on model predictions.

Paper Structure

This paper contains 32 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Magic nearly perfectly predicts the effect of training data removal. In contrast to the best baselines park2023trakgrosse2023studying, Magic produces estimates that both (a) highly correlate with the ground truth effect and (b) are well-scaled. Right: we plot the predicted loss (from Magic and the two baselines) against the true loss for a randomly chosen test point, each point a training data subset with a random 1% of samples removed. For Magic we plot the predicted loss directly since it is well-scaled; for the baselines, we first rescale the predictions to match the variance of the ground-truth losses. Left: The average (taken across test examples) Spearman correlation between predicted and true model losses (also known as the LDS ilyas2022datamodelspark2023trak, see Section \ref{['sec:single_model_data_attribution']}).
  • Figure 2: Smoothness aids predictive data attribution. We plot the change in data weights $\varepsilon$ against the change in model output $\Delta(\varepsilon)$ for two hypothetical learning algorithms. On the left is a non-smooth setting where the gradient $f(\mathbf{w})/\mathbf{w}$ varies wildly with $\varepsilon$. On the right is a smooth setting where the change is well-behaved.
  • Figure 3: Forward computation graph for a model output function $f$ mapping from data weights $\mathbf{w}$ to the model output. The exact influence function $\partial f(\mathbf{w})/\partial \mathbf{w}$ is the metagradient of the model output with respect to the data weights $\mathbf{w}$.
  • Figure 4: Linear datamodeling score (LDS) vs. drop fraction across settings for Magic and baselines. The estimates of Magic consistently correlate with the true model outputs (LDS: near $1.0$ for small enough drop fraction) while baselines often do not (LDS: below $0.4$). LDS decreases with increasing drop fraction for Magic (as the Taylor estimate moves further from the center).
  • Figure 5: Results of Magic and baselines on randomly chosen, individual samples from the three settings we consider: CIFAR-10, Gemma-2B, and GPT-2. We evaluate by predicting model output after dropping a random 1%/5% of the data (cf. \ref{['eq:lds']}) and plotting the results against the true model output for that drop set. Magic estimates consistently highly correlate with the true output across settings (for small enough training data drop fractions).

Theorems & Definitions (4)

  • Remark 1: Single-model versus standard predictive data attribution
  • Example 1: Training an ResNet with SGD
  • Example 2: Training a language model with Adam
  • Remark 2: How restrictive is smoothness?