Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training
Jie Xu, Zihan Wu
TL;DR
Dynamic Influence Tracker (DIT) enables time-windowed measurement of training-sample influence during SGD without requiring convergence. By estimating parameter-change 94 beta_{-j}^{[t1,t2]} via a Hessian-informed projection and projecting changes with query vectors q(t), DIT quantifies how removing a sample affects losses, predictions, and gradients over arbitrary intervals. The approach comes with non-convex, convergence-free error bounds and demonstrates up to 0.99 correlation with ground-truth influence and >98% accuracy in corrupted-sample detection across diverse architectures. Empirically, DIT uncovers four influence-dynamics patterns, shows mid-training influence aligns with full-training influence, and scales to large models, enabling practical data-quality interventions and efficient training strategies.
Abstract
Existing methods for measuring training sample influence on models only provide static, overall measurements, overlooking how sample influence changes during training. We propose Dynamic Influence Tracker (DIT), which captures the time-varying sample influence across arbitrary time windows during training. DIT offers three key insights: 1) Samples show different time-varying influence patterns, with some samples important in the early training stage while others become important later. 2) Sample influences show a weak correlation between early and late stages, demonstrating that the model undergoes distinct learning phases with shifting priorities. 3) Analyzing influence during the convergence period provides more efficient and accurate detection of corrupted samples than full-training analysis. Supported by theoretical guarantees without assuming loss convexity or model convergence, DIT significantly outperforms existing methods, achieving up to 0.99 correlation with ground truth and above 98\% accuracy in detecting corrupted samples in complex architectures.
