The Memory Perturbation Equation: Understanding Model's Sensitivity to Data

Peter Nickl; Lu Xu; Dharmesh Tailor; Thomas Möllenhoff; Mohammad Emtiyaz Khan

The Memory Perturbation Equation: Understanding Model's Sensitivity to Data

Peter Nickl, Lu Xu, Dharmesh Tailor, Thomas Möllenhoff, Mohammad Emtiyaz Khan

TL;DR

The Memory-Perturbation Equation (MPE) addresses the problem of understanding how training data perturbations affect model behavior without costly retraining. It derives from the Bayesian Learning Rule to provide a unified, natural-gradient-based framework that applies across conjugate exponential-family models and extends to training-time sensitivity with Gaussian posteriors. The paper establishes exact recoveries for conjugate models, connects to Cook's influence function and neural-network influence via Laplace approximations, and offers practical, cheap sensitivity estimators that can predict generalization and guide hyperparameter tuning. Empirically, MPE-based estimates correlate with true data perturbations, accurately predict the impact of class removal on test performance, and serve as online diagnostics during training, indicating wide applicability to robust and privacy-aware learning.

Abstract

Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of models and algorithms, and unravels useful properties regarding sensitivities. Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data. The proposed equation is expected to be useful for future research on robust and adaptive learning.

The Memory Perturbation Equation: Understanding Model's Sensitivity to Data

TL;DR

Abstract

Paper Structure (32 sections, 4 theorems, 74 equations, 11 figures, 8 tables)

This paper contains 32 sections, 4 theorems, 74 equations, 11 figures, 8 tables.

Introduction
Understanding a Model's Sensitivity to Its Training Data
The Memory-Perturbation Equation (MPE)
Unifying the existing sensitivity measures as special cases of the MPE
Generalizing the perturbation method to estimate sensitivity during training
Understanding the causes of high sensitivity estimates for the Gaussian case
Experiments
Discussion
Influence Function for Linear Regression
Derivation of the leave-one-out (LOO) deviation
Derivation of the infinitesimal perturbation approach
Conjugate Exponential-Family Models
The Bayesian Learning Rule
The BLR of \ref{['eq:BLR']}
The conjugate-model form of the BLR given in \ref{['eq:BLR']}
...and 17 more sections

Key Result

Theorem 1

Assuming a conjugate exponential-family model, the posterior $q_*^{\backslash \mathcal{M}}$ (with natural parameter ${\boldsymbol{\lambda}_*^{\backslash \mathcal{M}}}$) can be written in terms of $q_*$ (with natural parameter $\boldsymbol{\lambda}_*$), as shown below: where all exponential families are defined by using inner-product $\langle \boldsymbol{\lambda}, \hbox{$\hbox{$\mathbf{T}$}$}(\bol

Figures (11)

Figure 1: Our main goal is to estimate the sensitivity of the training trajectory when examples are perturbed or simply removed; see Panel (a). We present the MPE to estimate the sensitivity without any retraining and use them to faithfully predict the test performance from training data alone; see Panel (b). The test negative log-likelihood (gray line) for ResNet--20 on CIFAR10 shows similar trends to the leave-one-out (LOO) score computed on the training data (black line).
Figure 2: The estimated deviation for an example removal correlates well with the true deviations in predictions. Each marker represents an example. For each panel, the histogram at the bottom shows that the majority of examples have low sensitivity and most of the large sensitivities are attributed to a small fraction of data. We show a few images of high and low sensitivity examples from two randomly chosen classes, where we observe the high-sensitivity examples to be more interesting (possibly mislabeled or just ambiguous), while low-sensitivity examples appear more predictable.
Figure 3: Panel (a) shows, in the x-axis, the test NLL of trained models with a class removed. In the y-axis, we show the respective leave-one-class-out (LOCO) estimates. Each marker correspond to a specific class removed (text indicates class names). Results for two models on FMNIST are shown. Both show good correlation between the test NLL and LOCO estimates; see the dashed lines. Panel (b) shows the evolution of estimated sensitivities during training of LeNet5 on FMNIST. As training progresses, the model becomes more and more sensitive to a small fraction of data.
Figure 4: The test NLL (gray) almost perfectly matches the estimated LOO-CV error of \ref{['eq:loo']} (black). The x-axis shows different values of $\delta$ parameter of an $L_2$-regularization $\delta\|\boldsymbol{\theta}\|^2/2$.
Figure 5: We compare faithfulness of LOO estimates during training to predict the test NLL. The first panel shows results for iBLR where a good match is obtained by using the LOO estimate of \ref{['eq:iBLR_loo']} which uses a diagonal preconditioner. The next two panels show results for SGD where we use the LOO estimate of \ref{['eq:loo']} but with different Hessian approximations. Panel (b) uses a diagonal-GGN which does not work very well. Results are improved when K-FAC is used, but they are still not as good as the iBLR, despite using a non-diagonal Hessian approximation.
...and 6 more figures

Theorems & Definitions (4)

Theorem 1
Theorem 2
Theorem 3
Theorem 4

The Memory Perturbation Equation: Understanding Model's Sensitivity to Data

TL;DR

Abstract

The Memory Perturbation Equation: Understanding Model's Sensitivity to Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (4)