Table of Contents
Fetching ...

f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness

Subhodip Panda, Dhruv Tarsadiya, Shashwat Sourav, Prathosh A. P, Sai Praneeth Karimireddy

TL;DR

This paper tackles the instability of data influence estimates caused by training randomness. It introduces f-influence, a hypothesis-testing framework that frames data influence as the distinguishability between training with and without a datapoint, connecting to Gaussian Differential Privacy via the Gaussian influence $G_\mu$ and its compositional and asymptotic properties. Building on this, the authors develop f-INE, a scalable algorithm that estimates influence in a single training run by monitoring gradient-based signals and leveraging a two-stage testing procedure to compute $\mu$. Empirically, f-INE outperforms baselines in identifying mislabeled MNIST samples and in attributing LLM behavior to poisoned training data, while displaying reduced variability across runs, supporting reliable data curation and model behavior understanding. The work also highlights a principled bridge between influence estimation, privacy auditing, and marginal data valuations, with implications for data Shapley and robust data cleaning in high-stakes settings.

Abstract

Influence estimation methods promise to explain and debug machine learning by estimating the impact of individual samples on the final model. Yet, existing methods collapse under training randomness: the same example may appear critical in one run and irrelevant in the next. Such instability undermines their use in data curation or cleanup since it is unclear if we indeed deleted/kept the correct datapoints. To overcome this, we introduce *f-influence* -- a new influence estimation framework grounded in hypothesis testing that explicitly accounts for training randomness, and establish desirable properties that make it suitable for reliable influence estimation. We also design a highly efficient algorithm **f**-**IN**fluence **E**stimation (**f-INE**) that computes f-influence **in a single training run**. Finally, we scale up f-INE to estimate influence of instruction tuning data on Llama-3.1-8B and show it can reliably detect poisoned samples that steer model opinions, demonstrating its utility for data cleanup and attributing model behavior.

f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness

TL;DR

This paper tackles the instability of data influence estimates caused by training randomness. It introduces f-influence, a hypothesis-testing framework that frames data influence as the distinguishability between training with and without a datapoint, connecting to Gaussian Differential Privacy via the Gaussian influence and its compositional and asymptotic properties. Building on this, the authors develop f-INE, a scalable algorithm that estimates influence in a single training run by monitoring gradient-based signals and leveraging a two-stage testing procedure to compute . Empirically, f-INE outperforms baselines in identifying mislabeled MNIST samples and in attributing LLM behavior to poisoned training data, while displaying reduced variability across runs, supporting reliable data curation and model behavior understanding. The work also highlights a principled bridge between influence estimation, privacy auditing, and marginal data valuations, with implications for data Shapley and robust data cleaning in high-stakes settings.

Abstract

Influence estimation methods promise to explain and debug machine learning by estimating the impact of individual samples on the final model. Yet, existing methods collapse under training randomness: the same example may appear critical in one run and irrelevant in the next. Such instability undermines their use in data curation or cleanup since it is unclear if we indeed deleted/kept the correct datapoints. To overcome this, we introduce *f-influence* -- a new influence estimation framework grounded in hypothesis testing that explicitly accounts for training randomness, and establish desirable properties that make it suitable for reliable influence estimation. We also design a highly efficient algorithm **f**-**IN**fluence **E**stimation (**f-INE**) that computes f-influence **in a single training run**. Finally, we scale up f-INE to estimate influence of instruction tuning data on Llama-3.1-8B and show it can reliably detect poisoned samples that steer model opinions, demonstrating its utility for data cleanup and attributing model behavior.

Paper Structure

This paper contains 25 sections, 17 theorems, 80 equations, 9 figures, 1 table, 2 algorithms.

Key Result

Theorem 2.6

Let $\forall i \in [k], f_i$ be the tradeoff functions. Now if $\mathcal{S}$ is $f_i$-influential with respect to algorithm $A_i$ then the $k$-fold composed algorithm $A$ is at most $f_1 \otimes \ldots \otimes f_k$-influential.

Figures (9)

  • Figure 1: Test losses on specific data points vary significantly across training runs due to intrinsic non-determinism in ML pipelines. Consequently, influence scores derived from such losses also inherit randomness. Decisions based on a single run -- such as deleting seemingly low-influence data may prove suboptimal in subsequent runs, potentially causing unexpected performance drops. Thus, a key challenge is how to properly account for training randomness in influence estimation.
  • Figure 2: (In)consistency of influence scores across multiple random seeds. Existing approaches such as Influence Functions, TRAK, and TraceIn exhibit significant variability due to sensitivity to data shuffling. This leads to low consistency scores. In contrast, our proposed method, f-INE, achieves a much higher consistency score, demonstrating robustness to training randomness.
  • Figure 3: Lack of total ordering in influence under training randomness: removing $d_1$ always decreases accuracy by $0.1\%$, while removing $d_2$ increases accuracy by $1\%$ but only with probability $0.1$. Both have the same mean influence, yet it is unclear which one is more influential. This problem arises as there is a lack of total order in defining data influence under training randomness.
  • Figure 4: Lack of total order between arbitrary trade-off functions: no trade-off curve dominates the other. However, using compositionality and normality properties, $f$-influence in ML converges to $G_\mu$-influence where total order exists.
  • Figure 5: Overview of f-INE algorithm: Given a user-specified data subset $\mathcal{S}$, our method quantifies the influence of $\mathcal{S}$ as the statistical distinguishability between two distributions $P$ and $Q$. $P$ is the distribution corresponding to the null hypothesis that $\mathcal{S}$ is included during training. $Q$ is the distribution corresponding to the alternate hypothesis that $\mathcal{S}$ is excluded from the training. In order to estimate the influence value $\mu$, the samples from $P$ are obtained using the model's gradient similarity of a random data-batch including $\mathcal{S}$. Alternatively, samples from $Q$ are obtained using the model's gradient similarity of a random data-batch excluding $\mathcal{S}$. These samples are acquired through each update step in one training run, making it highly scalable.
  • ...and 4 more figures

Theorems & Definitions (30)

  • Definition 2.1: Informal: hypothesis testing based influence
  • Definition 2.2: type-I and type-II errors
  • Definition 2.3: trade-off function
  • Definition 2.4: f-influence
  • Definition 2.5: Canonical influence: Gaussian or $G_\mu$-influence
  • Theorem 2.6: compositionality
  • Corollary 2.7
  • Theorem 2.8: informal asymptotic normality
  • Proposition C.1
  • proof
  • ...and 20 more