SG-OIF: A Stability-Guided Online Influence Framework for Reliable Vision Data
Penghao Rao, Runmin Jiang, Min Xu
TL;DR
SG-OIF introduces a stability-guided online influence framework to enable reliable, real-time estimation of training-point influence in vision models. It fuses per-anchor IHVPs with stability-based confidence gating, modular curvature backends, and streaming refinement to produce robust, interpretable per-sample influence scores during training. The approach comes with theoretical guarantees under local PL conditions and empirical demonstrations showing state-of-the-art performance on noisy-label detection and OOD detection across diverse datasets, while maintaining lower computational overhead. This work provides a practical, provable pathway for continuous data oversight, targeted unlearning, and robust data-quality assessment in real-world vision systems.
Abstract
Approximating training-point influence on test predictions is critical for deploying deep-learning vision models, essential for locating noisy data. Though the influence function was proposed for attributing how infinitesimal up-weighting or removal of individual training examples affects model outputs, its implementation is still challenging in deep-learning vision models: inverse-curvature computations are expensive, and training non-stationarity invalidates static approximations. Prior works use iterative solvers and low-rank surrogates to reduce cost, but offline computation lags behind training dynamics, and missing confidence calibration yields fragile rankings that misidentify critical examples. To address these challenges, we introduce a Stability-Guided Online Influence Framework (SG-OIF), the first framework that treats algorithmic stability as a real-time controller, which (i) maintains lightweight anchor IHVPs via stochastic Richardson and preconditioned Neumann; (ii) proposes modular curvature backends to modulate per-example influence scores using stability-guided residual thresholds, anomaly gating, and confidence. Experimental results show that SG-OIF achieves SOTA (State-Of-The-Art) on noise-label and out-of-distribution detection tasks across multiple datasets with various corruption. Notably, our approach achieves 91.1\% accuracy in the top 1\% prediction samples on the CIFAR-10 (20\% asym), and gets 99.8\% AUPR score on MNIST, effectively demonstrating that this framework is a practical controller for online influence estimation.
