Table of Contents
Fetching ...

Deeper Understanding of Black-box Predictions via Generalized Influence Functions

Hyeonsu Lyu, Jonggyu Jang, Sehyun Ryu, Hyun Jong Yang

TL;DR

Influence functions struggle to scale to large, non-convex models due to first-order approximations that conflate relevant and nuisance parameter changes. The authors propose Generalized Influence Functions (GIF) that target a subset of parameters tied to the input data and couple this with a modified LiSSA approach that guarantees convergence, enabling accurate data updates with only a small fraction of parameters. Across data removal, label change, and backdoor scenarios, GIF consistently outperforms traditional IFs and baselines, closely matching results from retraining while requiring far less computation. This work provides a foundation for more reliable data-centric explanations, robust model editing, and improved AI interpretability, with accompanying code available to facilitate adoption.

Abstract

Influence functions (IFs) elucidate how training data changes model behavior. However, the increasing size and non-convexity in large-scale models make IFs inaccurate. We suspect that the fragility comes from the first-order approximation which may cause nuisance changes in parameters irrelevant to the examined data. However, simply computing influence from the chosen parameters can be misleading, as it fails to nullify the hidden effects of unselected parameters on the analyzed data. Thus, our approach introduces generalized IFs, precisely estimating target parameters' influence while nullifying nuisance gradient changes on fixed parameters. We identify target update parameters closely associated with the input data by the output- and gradient-based parameter selection methods. We verify the generalized IFs with various alternatives of IFs on the class removal and label change tasks. The experiments align with the "less is more" philosophy, demonstrating that updating only 5\% of the model produces more accurate results than other influence functions across all tasks. We believe our proposal works as a foundational tool for optimizing models, conducting data analysis, and enhancing AI interpretability beyond the limitation of IFs. Codes are available at https://github.com/hslyu/GIF.

Deeper Understanding of Black-box Predictions via Generalized Influence Functions

TL;DR

Influence functions struggle to scale to large, non-convex models due to first-order approximations that conflate relevant and nuisance parameter changes. The authors propose Generalized Influence Functions (GIF) that target a subset of parameters tied to the input data and couple this with a modified LiSSA approach that guarantees convergence, enabling accurate data updates with only a small fraction of parameters. Across data removal, label change, and backdoor scenarios, GIF consistently outperforms traditional IFs and baselines, closely matching results from retraining while requiring far less computation. This work provides a foundation for more reliable data-centric explanations, robust model editing, and improved AI interpretability, with accompanying code available to facilitate adoption.

Abstract

Influence functions (IFs) elucidate how training data changes model behavior. However, the increasing size and non-convexity in large-scale models make IFs inaccurate. We suspect that the fragility comes from the first-order approximation which may cause nuisance changes in parameters irrelevant to the examined data. However, simply computing influence from the chosen parameters can be misleading, as it fails to nullify the hidden effects of unselected parameters on the analyzed data. Thus, our approach introduces generalized IFs, precisely estimating target parameters' influence while nullifying nuisance gradient changes on fixed parameters. We identify target update parameters closely associated with the input data by the output- and gradient-based parameter selection methods. We verify the generalized IFs with various alternatives of IFs on the class removal and label change tasks. The experiments align with the "less is more" philosophy, demonstrating that updating only 5\% of the model produces more accurate results than other influence functions across all tasks. We believe our proposal works as a foundational tool for optimizing models, conducting data analysis, and enhancing AI interpretability beyond the limitation of IFs. Codes are available at https://github.com/hslyu/GIF.
Paper Structure (43 sections, 2 theorems, 18 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 43 sections, 2 theorems, 18 equations, 8 figures, 3 tables, 1 algorithm.

Key Result

Proposition 3.1

For any $\ell(\cdot)$ and $\mathcal{L}(\cdot)$, the following statements hold: i) The GIF is scale-invariant. That is, scalar multiplication on the loss $\ell(\cdot)$ and empirical risk $\mathcal{L}(\cdot)$ does not change $\mathcal{I}(\bm{w}, {\hat{{\bm{\theta}}}}_J|{\hat{{\bm{\theta}}}})$. ii) The

Figures (8)

  • Figure 1: Overview of our approach and the original influence functions with the test accuracy per updated parameter ratio for various parameter selection schemes. Both Koh2017_IF and our approach linearly transform the gradient of examined data, but our method negates changes in irrelevant parameters by projecting the gradient into the space of selected parameters.
  • Figure 2: Visualization of the weight updates via three influence functions. The contour represents the level curve of the loss. The circular markers represent every 10th removal of four data points. The optimal weight indicates $(\theta_1, \theta_2)$ that provides the least loss when freezing the other parameters.
  • Figure 3: Test accuracy evaluation for various parameter selection criteria. The modification ratio indicates the ratio of parameters updated.
  • Figure 4: Backdoor recovery scenario.
  • Figure 5: Inference histogram from the five models: the original model before removal, the three models updated by the GIF, and the retrained model.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Proposition 3.1
  • proof
  • Theorem D.1: Neuman series