Table of Contents
Fetching ...

Enhancing Training Data Attribution with Representational Optimization

Weiwei Sun, Haokun Liu, Nikhil Kandpal, Colin Raffel, Yiming Yang

TL;DR

AirRep tackles the challenge of scalable, accurate training data attribution by learning task- and model-aligned representations optimized for attribution quality. It combines a trainable encoder with an attention-based pooling mechanism and trains via a weighted pairwise ranking objective on automatically generated data subsets, aligning predicted scores with actual model losses. Empirically, AirRep matches or surpasses state-of-the-art gradient-based TDA methods on instruction-tuned LLMs while delivering near two orders of magnitude faster inference and substantially lower storage. The approach generalizes across tasks and models, and its training cost can be amortized by reusing a single AirRep across different target LMs. This makes AirRep a practical and scalable solution for data-centric AI workflows, including data attribution, selection, and explainability in large-scale NLP systems.

Abstract

Training data attribution (TDA) methods aim to measure how training data impacts a model's predictions. While gradient-based attribution methods, such as influence functions, offer theoretical grounding, their computational costs make them impractical for large-scale applications. Representation-based approaches are far more scalable, but typically rely on heuristic embeddings that are not optimized for attribution, limiting their fidelity. To address these challenges, we propose AirRep, a scalable, representation-based approach that closes this gap by learning task-specific and model-aligned representations optimized explicitly for TDA. AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence. We train AirRep using a ranking objective over automatically constructed training subsets labeled by their empirical effect on target predictions. Experiments on instruction-tuned LLMs demonstrate that AirRep achieves performance on par with state-of-the-art gradient-based approaches while being nearly two orders of magnitude more efficient at inference time. Further analysis highlights its robustness and generalization across tasks and models. Our code is available at https://github.com/sunnweiwei/AirRep

Enhancing Training Data Attribution with Representational Optimization

TL;DR

AirRep tackles the challenge of scalable, accurate training data attribution by learning task- and model-aligned representations optimized for attribution quality. It combines a trainable encoder with an attention-based pooling mechanism and trains via a weighted pairwise ranking objective on automatically generated data subsets, aligning predicted scores with actual model losses. Empirically, AirRep matches or surpasses state-of-the-art gradient-based TDA methods on instruction-tuned LLMs while delivering near two orders of magnitude faster inference and substantially lower storage. The approach generalizes across tasks and models, and its training cost can be amortized by reusing a single AirRep across different target LMs. This makes AirRep a practical and scalable solution for data-centric AI workflows, including data attribution, selection, and explainability in large-scale NLP systems.

Abstract

Training data attribution (TDA) methods aim to measure how training data impacts a model's predictions. While gradient-based attribution methods, such as influence functions, offer theoretical grounding, their computational costs make them impractical for large-scale applications. Representation-based approaches are far more scalable, but typically rely on heuristic embeddings that are not optimized for attribution, limiting their fidelity. To address these challenges, we propose AirRep, a scalable, representation-based approach that closes this gap by learning task-specific and model-aligned representations optimized explicitly for TDA. AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence. We train AirRep using a ranking objective over automatically constructed training subsets labeled by their empirical effect on target predictions. Experiments on instruction-tuned LLMs demonstrate that AirRep achieves performance on par with state-of-the-art gradient-based approaches while being nearly two orders of magnitude more efficient at inference time. Further analysis highlights its robustness and generalization across tasks and models. Our code is available at https://github.com/sunnweiwei/AirRep

Paper Structure

This paper contains 62 sections, 1 theorem, 92 equations, 9 figures, 9 tables.

Key Result

Theorem B.1

Let $\phi(z) = H_{\boldsymbol{\theta}}^{-\tfrac{1}{2}} \nabla_\theta \ell\bigl(z; \theta\bigr)$, and assume $\nabla_\theta^{k} \ell(z; \theta)$ for $k \geq 3$ is negligible. Then the $k$-order group influence function is given by: where: $c_{t}^{(k)}$ are constants at the $t$-th order, dependent on the meta-information of the distribution of $S$ and $\alpha_{t}(z)$ is defined as:

Figures (9)

  • Figure 1: Performance comparison of gradient-based and representation-based training data attribution (TDA) approaches. Left: Average linear data model score (LDS) Ilyas2022DatamodelsPP on 4 unseen datasets (FLAN, Alpaca, Tulu, SafeRLHF). AirRep outperforms state-of-the-art gradient-based methods such as LoGra (which uses 48$\times$ more storage). Right: Inference speed (encoded examples per second on a single GPU). AirRep is nearly two orders of magnitude more efficient than gradient-based methods during inference.
  • Figure 2: Model Architecture and Optimization. The test example $x$ and the training subsets $S_1$ and $S_2$ are encoded by an encoder with a pooler to obtain embeddings. The score is computed as the inner product of the embeddings. The overall model is trained based on pairwise comparisons to distinguish the usefulness of different subsets with respect to the test example $x$.
  • Figure 3: LDS correlation scores on FLAN vs. inference-time cost and storage for various TDA methods across Qwen2.5 models (0.5B–7B). Computation time (Log10 scale) is measured relative to GTE-Small on the same machine; marker size reflects storage (smaller = more efficient). Each method has multiple points for different model/dimension settings.
  • Figure 4: Evaluation Results of Data Selection. We report the average F1 score of 66 tasks in FLAN, obtained by training Qwen2.5 LM of different sizes on the top-1000 selected examples for each task.
  • Figure 5: Ablation Study results: average LDS score on FLAN, Alpaca, Tulu, and SafeRLHF.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Theorem B.1: Group Influence Function