Table of Contents
Fetching ...

SG-OIF: A Stability-Guided Online Influence Framework for Reliable Vision Data

Penghao Rao, Runmin Jiang, Min Xu

TL;DR

SG-OIF introduces a stability-guided online influence framework to enable reliable, real-time estimation of training-point influence in vision models. It fuses per-anchor IHVPs with stability-based confidence gating, modular curvature backends, and streaming refinement to produce robust, interpretable per-sample influence scores during training. The approach comes with theoretical guarantees under local PL conditions and empirical demonstrations showing state-of-the-art performance on noisy-label detection and OOD detection across diverse datasets, while maintaining lower computational overhead. This work provides a practical, provable pathway for continuous data oversight, targeted unlearning, and robust data-quality assessment in real-world vision systems.

Abstract

Approximating training-point influence on test predictions is critical for deploying deep-learning vision models, essential for locating noisy data. Though the influence function was proposed for attributing how infinitesimal up-weighting or removal of individual training examples affects model outputs, its implementation is still challenging in deep-learning vision models: inverse-curvature computations are expensive, and training non-stationarity invalidates static approximations. Prior works use iterative solvers and low-rank surrogates to reduce cost, but offline computation lags behind training dynamics, and missing confidence calibration yields fragile rankings that misidentify critical examples. To address these challenges, we introduce a Stability-Guided Online Influence Framework (SG-OIF), the first framework that treats algorithmic stability as a real-time controller, which (i) maintains lightweight anchor IHVPs via stochastic Richardson and preconditioned Neumann; (ii) proposes modular curvature backends to modulate per-example influence scores using stability-guided residual thresholds, anomaly gating, and confidence. Experimental results show that SG-OIF achieves SOTA (State-Of-The-Art) on noise-label and out-of-distribution detection tasks across multiple datasets with various corruption. Notably, our approach achieves 91.1\% accuracy in the top 1\% prediction samples on the CIFAR-10 (20\% asym), and gets 99.8\% AUPR score on MNIST, effectively demonstrating that this framework is a practical controller for online influence estimation.

SG-OIF: A Stability-Guided Online Influence Framework for Reliable Vision Data

TL;DR

SG-OIF introduces a stability-guided online influence framework to enable reliable, real-time estimation of training-point influence in vision models. It fuses per-anchor IHVPs with stability-based confidence gating, modular curvature backends, and streaming refinement to produce robust, interpretable per-sample influence scores during training. The approach comes with theoretical guarantees under local PL conditions and empirical demonstrations showing state-of-the-art performance on noisy-label detection and OOD detection across diverse datasets, while maintaining lower computational overhead. This work provides a practical, provable pathway for continuous data oversight, targeted unlearning, and robust data-quality assessment in real-world vision systems.

Abstract

Approximating training-point influence on test predictions is critical for deploying deep-learning vision models, essential for locating noisy data. Though the influence function was proposed for attributing how infinitesimal up-weighting or removal of individual training examples affects model outputs, its implementation is still challenging in deep-learning vision models: inverse-curvature computations are expensive, and training non-stationarity invalidates static approximations. Prior works use iterative solvers and low-rank surrogates to reduce cost, but offline computation lags behind training dynamics, and missing confidence calibration yields fragile rankings that misidentify critical examples. To address these challenges, we introduce a Stability-Guided Online Influence Framework (SG-OIF), the first framework that treats algorithmic stability as a real-time controller, which (i) maintains lightweight anchor IHVPs via stochastic Richardson and preconditioned Neumann; (ii) proposes modular curvature backends to modulate per-example influence scores using stability-guided residual thresholds, anomaly gating, and confidence. Experimental results show that SG-OIF achieves SOTA (State-Of-The-Art) on noise-label and out-of-distribution detection tasks across multiple datasets with various corruption. Notably, our approach achieves 91.1\% accuracy in the top 1\% prediction samples on the CIFAR-10 (20\% asym), and gets 99.8\% AUPR score on MNIST, effectively demonstrating that this framework is a practical controller for online influence estimation.

Paper Structure

This paper contains 87 sections, 6 theorems, 74 equations, 6 figures, 11 tables, 1 algorithm.

Key Result

Lemma 1

Let $H \succ 0$ denote the curvature operator at a fixed training time with $\lambda_{\min}(H) \geq m > 0$, and let $v \in \mathbb{R}^d$ be the target vector. For a sample $z$, define the true single-anchor influence as where $g_z = \nabla_\theta \ell(\theta; z)$. Let the estimated multi-anchor score be where $\varphi_a$ is an approximate IHVP for anchor $a$ with target $v_a$, the residual is $r

Figures (6)

  • Figure 1: Workflow of SG-OIF. The vature surrogate for the surrogate is updated online; then inverse Hessian vector proxies for anchors are tracked with lightweight iterations and calibrated by stability-based confidence; the influence of training data points is computed by aggregating stability-weighted per-anchor scores, with optional refinement when high-influence but low-confidence; finally, the most influential training data points are returned.
  • Figure 2: Overview of SG-OIF Architecture. The pipeline begins with input, then computes influence scores via stability-guided scoring and returns the most influential points. In detail, for samples that exhibit high influence but low confidence weights, an optional refinement step is triggered to enhance the estimation. This step is so so-called Stability-Guided Controller. Influence is anchored: anchored samples flow back to the anchored volume. Multi-anchor weighted aggregation combines per-anchor results into a refined influence score. his step also performs diagnostics and anchor maintenance, which modifies the anchored volume. These calibrated scores feed ranking and task adaptors to select the most influential data and support downstream actions. The framework ensures reliable, robust influence estimation and output.
  • Figure 3: Residual Convergence and AUPR Improvement Under Stability Gating. SG-OIF (green) achieves rapid residual (solid line) convergence and monotonic AUPR (dashed line) growth, outperforming no-gating (red) and simple MA gating (blue), confirming the necessity of stability monitoring for robust ranking.
  • Figure 4: Reproducibility and Stability. SG-OIF (green) exhibits lower standard deviation variance (left) and AUPR variance (right) compared to no-gating (red) and simple MA gating (blue), confirming the strong reproducibility and stability of SG-OIF.
  • Figure 5: Effectiveness of Stability-Guided Estimation. SG-OIF (left) achieves rapid convergence with minimal variance, while no-gating exhibits persistent oscillations. SG-OIF (right) outperforms baselines. The improvement validates the effectiveness.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Lemma 1: Influence Error Decomposition
  • proof
  • Lemma 2: Top-$K$ Order Preservation
  • proof
  • Lemma 3: Probability with Confidence-Weighted Estimation
  • proof
  • Lemma 4: Convergence and Residual-Controlled Early Stopping
  • proof
  • Lemma 5: Projection Error Bound via Gram Conditioning
  • proof
  • ...and 2 more