MAGIC: Near-Optimal Data Attribution for Deep Learning
Andrew Ilyas, Logan Engstrom
TL;DR
MAGIC addresses predictive data attribution for non-convex deep learning by introducing a single-model attribution framework and a method to compute the exact influence of training data on a specific model. It leverages Replay to obtain the exact metagradient of the model output with respect to data weights, enabling near-optimal estimation of how adding or removing data changes predictions in large-scale iterative, smooth learners. Across vision and language benchmarks, MAGIC achieves near-perfect alignment with ground-truth changes (LDS ≈ 1.0 for small data drops), significantly outperforming baselines. The approach enables practical operations such as targeted data deletion and model debugging with high fidelity, albeit with higher computational cost that scales with the number of test samples; future work aims to improve efficiency and extend to broader scenarios.
Abstract
The goal of predictive data attribution is to estimate how adding or removing a given set of training datapoints will affect model predictions. In convex settings, this goal is straightforward (i.e., via the infinitesimal jackknife). In large-scale (non-convex) settings, however, existing methods are far less successful -- current methods' estimates often only weakly correlate with ground truth. In this work, we present a new data attribution method (MAGIC) that combines classical methods and recent advances in metadifferentiation to (nearly) optimally estimate the effect of adding or removing training data on model predictions.
