The Approximate Fisher Influence Function: Faster Estimation of Data Influence in Statistical Models
Omri Lev, Ashia C. Wilson
TL;DR
The paper addresses the computational bottleneck of estimating how infinitesimal training-data perturbations affect model performance. It introduces the Approximate Fisher Influence Function (AFIF), which replaces the Hessian with a Fisher Information Matrix (FIM) derived from an exponential-family loss, yielding a fast, information-geometry–based approach with theoretical guarantees that extend to non-differentiable regularizers. The authors prove error bounds for the AFIF-based inference objectives and demonstrate through experiments on fairness, unlearning, cross-validation, and data attribution that AFIF achieves Hessian-like accuracy while substantially reducing computation time. This yields a scalable, robust tool for tasks such as cross-validation, data attribution, and data removal, with practical impact on model evaluation and responsible ML practices.
Abstract
Quantifying the influence of infinitesimal changes in training data on model performance is crucial for understanding and improving machine learning models. In this work, we reformulate this problem as a weighted empirical risk minimization and enhance existing influence function-based methods by using information geometry to derive a new algorithm to estimate influence. Our formulation proves versatile across various applications, and we further demonstrate in simulations how it remains informative even in non-convex cases. Furthermore, we show that our method offers significant computational advantages over current Newton step-based methods.
