Table of Contents
Fetching ...

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Myeongseob Ko, Feiyang Kang, Weiyan Shi, Ming Jin, Zhou Yu, Ruoxi Jia

TL;DR

The paper introduces the Mirrored Influence Hypothesis, positing a reciprocal relationship between train-to-test and test-to-train data influence, and leverages it to develop Forward-INF, a forward-pass–based algorithm that significantly improves efficiency for data influence estimation on large models. The method is validated both empirically and across diverse applications, including diffusion-model attribution, data leakage detection, memorization analysis, mislabeled data detection, and language-model behavior tracing, achieving high accuracy and large speedups over gradient-based baselines. The work provides a practical, scalable tool for data-centric auditing and transparency in real-world AI systems.

Abstract

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models. In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches. We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

TL;DR

The paper introduces the Mirrored Influence Hypothesis, positing a reciprocal relationship between train-to-test and test-to-train data influence, and leverages it to develop Forward-INF, a forward-pass–based algorithm that significantly improves efficiency for data influence estimation on large models. The method is validated both empirically and across diverse applications, including diffusion-model attribution, data leakage detection, memorization analysis, mislabeled data detection, and language-model behavior tracing, achieving high accuracy and large speedups over gradient-based baselines. The work provides a practical, scalable tool for data-centric auditing and transparency in real-world AI systems.

Abstract

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models. In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches. We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.
Paper Structure (33 sections, 15 equations, 13 figures, 10 tables, 1 algorithm)

This paper contains 33 sections, 15 equations, 13 figures, 10 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of our approach and comparison with prior work which can be generally categorized into re-training-based methods and gradient-based methods. The former requires re-training models on many different subsets of training data ilyas2022datamodelsjia2019efficientghorbani2019datafeldman2020neural. The latter calculates the influences based on training data gradients (TracInpruthi2020estimating is illustrated as an example in the second column). Our proposed method $\texttt{Forward-INF}$ features only forward pass computation for each training point, offering significant efficiency improvement.
  • Figure 2: We observe high correlation between train-to-test influence $\text{Inf}(D_i\rightarrow D_\text{tst})$ and test-to-train influence $\text{Inf}(D_i\leftarrow D_\text{tst})$. The average Pearson Correlation is 0.9673 for logistic regression and 0.8851 for CNN trained on CIFAR-10.
  • Figure 3: Data attribution in diffusion models. For given synthesized samples of the second column, obtained by fine-tuning with an image of the first column, we illustrate the points with the highest influences in the candidate set. Our method can assign the highest influence to the fine-tuning point which computationally influences the synthesized image the most.
  • Figure 4: Memorization analysis, where the goal is to identify which training point's memorization is critical for predicting a specific test point. Prior work feldman2020neural proposed an algorithm to compute memorized training-test pairs, but it requires re-training the target model many times. We show that $\texttt{Forward-INF}$ can identify the same memorized pairs without the need for re-training.
  • Figure 5: Mislabeled data detection in a subset of CIFAR-10. Left) Mislabeled data detection performance comparison between $\texttt{Forward-INF}$ and IF. Right) Computation time comparison between methods. $\texttt{Forward-INF}$ is not only effective in detecting mislabeled training data but also efficient in its computation.
  • ...and 8 more figures