Table of Contents
Fetching ...

Learning to Weight Parameters for Training Data Attribution

Shuangqi Li, Hieu Le, Jingyi Xu, Mathieu Salzmann

TL;DR

The paper addresses the challenge of attributing model outputs to training data by revealing that parameter importance is heterogeneous across network components. It introduces a data-driven, self-supervised framework that learns non-negative weights for parameter groups, reweighting gradient-based attribution signals to maximize a proxy for signal-to-noise ratio. The method generalizes across vision, language, and diffusion tasks, boosting attribution quality (as measured by LDS) and enabling fine-grained semantic attribution (subject/style/background). This approach yields interpretable insights into which architectural components influence specific aspects of generation and demonstrates robust transferability across datasets and attribution methods, with potential implications for transparency and data governance. Future work could push toward parameter-level weighting to further enhance attribution precision.

Abstract

We study gradient-based data attribution, aiming to identify which training examples most influence a given output. Existing methods for this task either treat network parameters uniformly or rely on implicit weighting derived from Hessian approximations, which do not fully model functional heterogeneity of network parameters. To address this, we propose a method to explicitly learn parameter importance weights directly from data, without requiring annotated labels. Our approach improves attribution accuracy across diverse tasks, including image classification, language modeling, and diffusion, and enables fine-grained attribution for concepts like subject and style.

Learning to Weight Parameters for Training Data Attribution

TL;DR

The paper addresses the challenge of attributing model outputs to training data by revealing that parameter importance is heterogeneous across network components. It introduces a data-driven, self-supervised framework that learns non-negative weights for parameter groups, reweighting gradient-based attribution signals to maximize a proxy for signal-to-noise ratio. The method generalizes across vision, language, and diffusion tasks, boosting attribution quality (as measured by LDS) and enabling fine-grained semantic attribution (subject/style/background). This approach yields interpretable insights into which architectural components influence specific aspects of generation and demonstrates robust transferability across datasets and attribution methods, with potential implications for transparency and data governance. Future work could push toward parameter-level weighting to further enhance attribution precision.

Abstract

We study gradient-based data attribution, aiming to identify which training examples most influence a given output. Existing methods for this task either treat network parameters uniformly or rely on implicit weighting derived from Hessian approximations, which do not fully model functional heterogeneity of network parameters. To address this, we propose a method to explicitly learn parameter importance weights directly from data, without requiring annotated labels. Our approach improves attribution accuracy across diverse tasks, including image classification, language modeling, and diffusion, and enables fine-grained attribution for concepts like subject and style.

Paper Structure

This paper contains 43 sections, 26 equations, 6 figures, 11 tables, 1 algorithm.

Figures (6)

  • Figure 1: Attribution Strength Across Parameter Groups measured using the Linear Datamodeling Score (LDS) park2023trak. (a) The LDS computed using gradients from individual parameter groups shows significant variations in attribution strength. (b) Aggregating the LDS by block depth (e.g., "down_blocks.0" representing the first downsampling block in the UNet) and functionality within each block (e.g., "attn1" representing the self-attention layers, "attn2.to_q" representing the query projection layers in cross-attention modules) reveals significantly varied attribution strength for different network depths and functional components. Error bars are calculated over 1,000 samples.
  • Figure 2: Parameter specialization by semantic element. Heatmaps show average recall@10 for attributing (a) subject, (b) style, and (c) background using gradients from different parameter groups (UNet block depth and attention components). Brighter indicates stronger attribution strength for that semantic element by that parameter group.
  • Figure 3: Learned Parameter Importance Weights for overall (top-left), subject (top-right), style (bottom-left), and background (bottom-right) attribution. Each plot shows the learned weights across 256 parameter groups in the UNet, organized by Down, Mid, and Up blocks, with different colors indicating each block type.
  • Figure 4: Fine-grained attribution example. The query image (top-left) was generated with the prompt "A watercolor illustration of Eevee, in a bamboo forest". The first row shows the top-5 positive and negative training samples identified by standard D-TRAK without weights. Each subsequent row (Subject, Style, Background) displays training samples retrieved using specialized weights learned for that semantic element. Images marked 'S', 'P', or 'B' are ground-truth contributors for style, subject, or background, respectively.
  • Figure 5: Illustrative examples from the SB-Pokemon dataset, showcasing samples of Pokemon subjects, artistic styles, and environmental backgrounds. These images were generated using MidJourney V6.1 from the categories listed in Table \ref{['tab:sb_pokemon_categories']}.
  • ...and 1 more figures