Deeper Understanding of Black-box Predictions via Generalized Influence Functions
Hyeonsu Lyu, Jonggyu Jang, Sehyun Ryu, Hyun Jong Yang
TL;DR
Influence functions struggle to scale to large, non-convex models due to first-order approximations that conflate relevant and nuisance parameter changes. The authors propose Generalized Influence Functions (GIF) that target a subset of parameters tied to the input data and couple this with a modified LiSSA approach that guarantees convergence, enabling accurate data updates with only a small fraction of parameters. Across data removal, label change, and backdoor scenarios, GIF consistently outperforms traditional IFs and baselines, closely matching results from retraining while requiring far less computation. This work provides a foundation for more reliable data-centric explanations, robust model editing, and improved AI interpretability, with accompanying code available to facilitate adoption.
Abstract
Influence functions (IFs) elucidate how training data changes model behavior. However, the increasing size and non-convexity in large-scale models make IFs inaccurate. We suspect that the fragility comes from the first-order approximation which may cause nuisance changes in parameters irrelevant to the examined data. However, simply computing influence from the chosen parameters can be misleading, as it fails to nullify the hidden effects of unselected parameters on the analyzed data. Thus, our approach introduces generalized IFs, precisely estimating target parameters' influence while nullifying nuisance gradient changes on fixed parameters. We identify target update parameters closely associated with the input data by the output- and gradient-based parameter selection methods. We verify the generalized IFs with various alternatives of IFs on the class removal and label change tasks. The experiments align with the "less is more" philosophy, demonstrating that updating only 5\% of the model produces more accurate results than other influence functions across all tasks. We believe our proposal works as a foundational tool for optimizing models, conducting data analysis, and enhancing AI interpretability beyond the limitation of IFs. Codes are available at https://github.com/hslyu/GIF.
