Table of Contents
Fetching ...

The Approximate Fisher Influence Function: Faster Estimation of Data Influence in Statistical Models

Omri Lev, Ashia C. Wilson

TL;DR

The paper addresses the computational bottleneck of estimating how infinitesimal training-data perturbations affect model performance. It introduces the Approximate Fisher Influence Function (AFIF), which replaces the Hessian with a Fisher Information Matrix (FIM) derived from an exponential-family loss, yielding a fast, information-geometry–based approach with theoretical guarantees that extend to non-differentiable regularizers. The authors prove error bounds for the AFIF-based inference objectives and demonstrate through experiments on fairness, unlearning, cross-validation, and data attribution that AFIF achieves Hessian-like accuracy while substantially reducing computation time. This yields a scalable, robust tool for tasks such as cross-validation, data attribution, and data removal, with practical impact on model evaluation and responsible ML practices.

Abstract

Quantifying the influence of infinitesimal changes in training data on model performance is crucial for understanding and improving machine learning models. In this work, we reformulate this problem as a weighted empirical risk minimization and enhance existing influence function-based methods by using information geometry to derive a new algorithm to estimate influence. Our formulation proves versatile across various applications, and we further demonstrate in simulations how it remains informative even in non-convex cases. Furthermore, we show that our method offers significant computational advantages over current Newton step-based methods.

The Approximate Fisher Influence Function: Faster Estimation of Data Influence in Statistical Models

TL;DR

The paper addresses the computational bottleneck of estimating how infinitesimal training-data perturbations affect model performance. It introduces the Approximate Fisher Influence Function (AFIF), which replaces the Hessian with a Fisher Information Matrix (FIM) derived from an exponential-family loss, yielding a fast, information-geometry–based approach with theoretical guarantees that extend to non-differentiable regularizers. The authors prove error bounds for the AFIF-based inference objectives and demonstrate through experiments on fairness, unlearning, cross-validation, and data attribution that AFIF achieves Hessian-like accuracy while substantially reducing computation time. This yields a scalable, robust tool for tasks such as cross-validation, data attribution, and data removal, with practical impact on model evaluation and responsible ML practices.

Abstract

Quantifying the influence of infinitesimal changes in training data on model performance is crucial for understanding and improving machine learning models. In this work, we reformulate this problem as a weighted empirical risk minimization and enhance existing influence function-based methods by using information geometry to derive a new algorithm to estimate influence. Our formulation proves versatile across various applications, and we further demonstrate in simulations how it remains informative even in non-convex cases. Furthermore, we show that our method offers significant computational advantages over current Newton step-based methods.
Paper Structure (53 sections, 12 theorems, 75 equations, 7 figures, 1 table)

This paper contains 53 sections, 12 theorems, 75 equations, 7 figures, 1 table.

Key Result

Proposition 1

Suppose Assump. ass:one_1, Assump. ass:two_1, and Assump. ass:boundedmoments hold for $(s,r) = \{(0,3), (1,3), (1,4), (1,2), (2,2), (3,2)\}$. When the IJ is used as a plug-in estimate for the LOOCV objective with $w^n \in \mathcal{D}^{-i}$, the error in this approximation is bounded as

Figures (7)

  • Figure 1: Model performance versus fairness metric for Fisher-based influence, Hessian-based influence, and the ERM solution from \ref{['eq:WERM_Def']}, evaluated on the Adult, Crime, and Insurance datasets using a two-layer classifier. Results are averaged over ten independent experiments. All cases demonstrate that the Fisher-based computations are faster than the Hessian-based computations yet still yield similar overall utility.
  • Figure 2: Test loss and CV estimators for Fisher and Hessian-based influence on the Adult dataset using a two-layer classifier, averaged over five folds. Fisher calculations are approximately twice as fast as Hessian computations and Hessian estimates are highly unstable, yielding invalid loss estimates.
  • Figure 3: Most and least influential images on a subset of CIFAR10 when using a simple CNN architecture.
  • Figure 4: Most and least influential images on a subset of CIFAR10 when using the ResNet18 architecture.
  • Figure 5: Running times for Fisher-based and Hessian-based influence function when calculated on a subset of CIFAR10 classified using ResNet18 and a simple three-layer CNN. In both cases, Fisher-based influence significantly accelerates the influence calculation.
  • ...and 2 more figures

Theorems & Definitions (25)

  • Remark 1
  • Remark 2
  • Proposition 1: LOOCV Approximation Bound (Wilson_OptimizerComparison, Thm. 4)
  • Proposition 2: Machine Unlearning WilsonSuriyakumar_UnlearningProximal
  • Proposition 3: Data Attribution (koh2019accuracy, Prop. 1)
  • Lemma 1
  • Theorem 1
  • Corollary 1: LOOCV
  • Corollary 2: Machine Unlearning
  • Corollary 3: Data Attribution
  • ...and 15 more