The Memory Perturbation Equation: Understanding Model's Sensitivity to Data
Peter Nickl, Lu Xu, Dharmesh Tailor, Thomas Möllenhoff, Mohammad Emtiyaz Khan
TL;DR
The Memory-Perturbation Equation (MPE) addresses the problem of understanding how training data perturbations affect model behavior without costly retraining. It derives from the Bayesian Learning Rule to provide a unified, natural-gradient-based framework that applies across conjugate exponential-family models and extends to training-time sensitivity with Gaussian posteriors. The paper establishes exact recoveries for conjugate models, connects to Cook's influence function and neural-network influence via Laplace approximations, and offers practical, cheap sensitivity estimators that can predict generalization and guide hyperparameter tuning. Empirically, MPE-based estimates correlate with true data perturbations, accurately predict the impact of class removal on test performance, and serve as online diagnostics during training, indicating wide applicability to robust and privacy-aware learning.
Abstract
Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of models and algorithms, and unravels useful properties regarding sensitivities. Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data. The proposed equation is expected to be useful for future research on robust and adaptive learning.
