Learning to Weight Parameters for Training Data Attribution
Shuangqi Li, Hieu Le, Jingyi Xu, Mathieu Salzmann
TL;DR
The paper addresses the challenge of attributing model outputs to training data by revealing that parameter importance is heterogeneous across network components. It introduces a data-driven, self-supervised framework that learns non-negative weights for parameter groups, reweighting gradient-based attribution signals to maximize a proxy for signal-to-noise ratio. The method generalizes across vision, language, and diffusion tasks, boosting attribution quality (as measured by LDS) and enabling fine-grained semantic attribution (subject/style/background). This approach yields interpretable insights into which architectural components influence specific aspects of generation and demonstrates robust transferability across datasets and attribution methods, with potential implications for transparency and data governance. Future work could push toward parameter-level weighting to further enhance attribution precision.
Abstract
We study gradient-based data attribution, aiming to identify which training examples most influence a given output. Existing methods for this task either treat network parameters uniformly or rely on implicit weighting derived from Hessian approximations, which do not fully model functional heterogeneity of network parameters. To address this, we propose a method to explicitly learn parameter importance weights directly from data, without requiring annotated labels. Our approach improves attribution accuracy across diverse tasks, including image classification, language modeling, and diffusion, and enables fine-grained attribution for concepts like subject and style.
