f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
Subhodip Panda, Dhruv Tarsadiya, Shashwat Sourav, Prathosh A. P, Sai Praneeth Karimireddy
TL;DR
This paper tackles the instability of data influence estimates caused by training randomness. It introduces f-influence, a hypothesis-testing framework that frames data influence as the distinguishability between training with and without a datapoint, connecting to Gaussian Differential Privacy via the Gaussian influence $G_\mu$ and its compositional and asymptotic properties. Building on this, the authors develop f-INE, a scalable algorithm that estimates influence in a single training run by monitoring gradient-based signals and leveraging a two-stage testing procedure to compute $\mu$. Empirically, f-INE outperforms baselines in identifying mislabeled MNIST samples and in attributing LLM behavior to poisoned training data, while displaying reduced variability across runs, supporting reliable data curation and model behavior understanding. The work also highlights a principled bridge between influence estimation, privacy auditing, and marginal data valuations, with implications for data Shapley and robust data cleaning in high-stakes settings.
Abstract
Influence estimation methods promise to explain and debug machine learning by estimating the impact of individual samples on the final model. Yet, existing methods collapse under training randomness: the same example may appear critical in one run and irrelevant in the next. Such instability undermines their use in data curation or cleanup since it is unclear if we indeed deleted/kept the correct datapoints. To overcome this, we introduce *f-influence* -- a new influence estimation framework grounded in hypothesis testing that explicitly accounts for training randomness, and establish desirable properties that make it suitable for reliable influence estimation. We also design a highly efficient algorithm **f**-**IN**fluence **E**stimation (**f-INE**) that computes f-influence **in a single training run**. Finally, we scale up f-INE to estimate influence of instruction tuning data on Llama-3.1-8B and show it can reliably detect poisoned samples that steer model opinions, demonstrating its utility for data cleanup and attributing model behavior.
