Table of Contents
Fetching ...

On the Feasibility of In-Context Probing for Data Attribution

Cathy Jiao, Gary Gao, Aditi Raghunathan, Chenyan Xiong

TL;DR

This work investigates whether in-context probing (ICP) can function as a fast proxy for gradient-based data attribution in data selection. By connecting ICP to gradient-based influence through local-data and implicit-gradient-descent perspectives, the authors show strong in-domain agreement between ICP and influence-based methods on NLP tasks and synthetic data, and demonstrate comparable downstream gains when fine-tuning on data ranked by either method. They also demonstrate cost benefits of ICP for data curation and provide a controlled synthetic study to isolate the mechanism behind the ICP–influence link. However, the connection weakens in out-of-domain settings, highlighting the need for bridging work to extend ICP’s applicability. Overall, the paper offers a practical pathway for efficient, data-centric model refinement in in-domain regimes and motivates future theoretical and empirical work on black-box data attribution.

Abstract

Data attribution methods are used to measure the contribution of training data towards model outputs, and have several important applications in areas such as dataset curation and model interpretability. However, many standard data attribution methods, such as influence functions, utilize model gradients and are computationally expensive. In our paper, we show in-context probing (ICP) -- prompting a LLM -- can serve as a fast proxy for gradient-based data attribution for data selection under conditions contingent on data similarity. We study this connection empirically on standard NLP tasks, and show that ICP and gradient-based data attribution are well-correlated in identifying influential training data for tasks that share similar task type and content as the training data. Additionally, fine-tuning models on influential data selected by both methods achieves comparable downstream performance, further emphasizing their similarities. We also examine the connection between ICP and gradient-based data attribution using synthetic data on linear regression tasks. Our synthetic data experiments show similar results with those from NLP tasks, suggesting that this connection can be isolated in simpler settings, which offers a pathway to bridging their differences.

On the Feasibility of In-Context Probing for Data Attribution

TL;DR

This work investigates whether in-context probing (ICP) can function as a fast proxy for gradient-based data attribution in data selection. By connecting ICP to gradient-based influence through local-data and implicit-gradient-descent perspectives, the authors show strong in-domain agreement between ICP and influence-based methods on NLP tasks and synthetic data, and demonstrate comparable downstream gains when fine-tuning on data ranked by either method. They also demonstrate cost benefits of ICP for data curation and provide a controlled synthetic study to isolate the mechanism behind the ICP–influence link. However, the connection weakens in out-of-domain settings, highlighting the need for bridging work to extend ICP’s applicability. Overall, the paper offers a practical pathway for efficient, data-centric model refinement in in-domain regimes and motivates future theoretical and empirical work on black-box data attribution.

Abstract

Data attribution methods are used to measure the contribution of training data towards model outputs, and have several important applications in areas such as dataset curation and model interpretability. However, many standard data attribution methods, such as influence functions, utilize model gradients and are computationally expensive. In our paper, we show in-context probing (ICP) -- prompting a LLM -- can serve as a fast proxy for gradient-based data attribution for data selection under conditions contingent on data similarity. We study this connection empirically on standard NLP tasks, and show that ICP and gradient-based data attribution are well-correlated in identifying influential training data for tasks that share similar task type and content as the training data. Additionally, fine-tuning models on influential data selected by both methods achieves comparable downstream performance, further emphasizing their similarities. We also examine the connection between ICP and gradient-based data attribution using synthetic data on linear regression tasks. Our synthetic data experiments show similar results with those from NLP tasks, suggesting that this connection can be isolated in simpler settings, which offers a pathway to bridging their differences.
Paper Structure (15 sections, 1 theorem, 23 equations, 7 figures, 5 tables)

This paper contains 15 sections, 1 theorem, 23 equations, 7 figures, 5 tables.

Key Result

Lemma 1

pruthi2020estimating Suppose we have a LLM with parameters $\theta$. We perform a gradient descent step with training sample $z$ with learning rate $\eta$ such that $\hat{\theta} = \theta - \eta \nabla \mathcal{L}(z; \theta)$. Then, Proof: First, we consider the change in loss of $z'$ using a first-order approximation: Next, suppose a gradient descent step is taken on training sample $z$, and t

Figures (7)

  • Figure 1: Correlation analysis between ICP, $\text{Infl}_{\text{Loc}}$, and $\text{Infl}_{\text{IP}}$ (aggregated across groups of 500 samples) with respect to content similarity (BertScore) using test and train samples from the same task.
  • Figure 2: Correlation analysis between ICP, $\text{Infl}_{\text{Loc}}$, and $\text{Infl}_{\text{IP}}$ (aggregated across groups of 500 samples) with respect to content similarity (BertScore) using test samples from Alpaca and training samples from DocQA/Pretrain datasets. Additional analysis in Appendix \ref{['sec:appendix-tables-figures']}.
  • Figure 3: Correlation analysis between rankings on the instructions from the Alpaca dataset assigned by ICP, $\text{Infl}_{\text{Loc}}$, and $\text{Infl}_{\text{IP}}$. All p-values are $<.05$.
  • Figure 4: Correlation analysis between ICP and $\text{Infl}_{\text{IP}}$ as the task/content similarity of a single training demonstration vary with respect to the test query.
  • Figure 5: Correlation analysis between ICP and $\text{Infl}_{\text{IP}}$ as both the task and content similarity of a single training demonstration vary with respect to the test query.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Lemma 1