Evaluating Machine Unlearning via Epistemic Uncertainty

Alexander Becker; Thomas Liebig

Evaluating Machine Unlearning via Epistemic Uncertainty

Alexander Becker, Thomas Liebig

TL;DR

This work tackles the challenge of evaluating Machine Unlearning by proposing an epistemic-uncertainty–based metric. It defines the information score $\\imath(\\theta; D) = \\mathrm{tr}(\\mathcal{I}(\\theta; D))$ via the empirical Fisher Information and derives an efficacy measure $\\text{efficacy}(\\theta; D) = 1/\\imath(\\theta; D)$, along with a computable upper bound $\\text{efficacy}(\\theta; D) \\le 1/\\|\\nabla \\mathcal{L}(\\theta, D)\\|_2^2$ that avoids full data re-processing. The authors compare three forgetting approaches—Retraining, Amnesiac Unlearning, and Fisher Forgetting—on MNIST and CIFAR-10, showing that decreasing accuracy or defeating adversarial attacks alone does not guarantee removal of sensitive information. Results reveal that updating directions matter: retraining and Fisher Forgetting tend toward reduced information exposure, while Amnesiac Unlearning can drift the model back toward the original state, highlighting the need for multiple, complementary metrics. Overall, the paper provides a practical, scalable framework for evaluating forgetting and outlines directions for broader surveys and connections to privacy guarantees such as certified removal and differential privacy.

Abstract

There has been a growing interest in Machine Unlearning recently, primarily due to legal requirements such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act. Thus, multiple approaches were presented to remove the influence of specific target data points from a trained model. However, when evaluating the success of unlearning, current approaches either use adversarial attacks or compare their results to the optimal solution, which usually incorporates retraining from scratch. We argue that both ways are insufficient in practice. In this work, we present an evaluation metric for Machine Unlearning algorithms based on epistemic uncertainty. This is the first definition of a general evaluation metric for Machine Unlearning to our best knowledge.

Evaluating Machine Unlearning via Epistemic Uncertainty

TL;DR

This work tackles the challenge of evaluating Machine Unlearning by proposing an epistemic-uncertainty–based metric. It defines the information score

via the empirical Fisher Information and derives an efficacy measure

, along with a computable upper bound

that avoids full data re-processing. The authors compare three forgetting approaches—Retraining, Amnesiac Unlearning, and Fisher Forgetting—on MNIST and CIFAR-10, showing that decreasing accuracy or defeating adversarial attacks alone does not guarantee removal of sensitive information. Results reveal that updating directions matter: retraining and Fisher Forgetting tend toward reduced information exposure, while Amnesiac Unlearning can drift the model back toward the original state, highlighting the need for multiple, complementary metrics. Overall, the paper provides a practical, scalable framework for evaluating forgetting and outlines directions for broader surveys and connections to privacy guarantees such as certified removal and differential privacy.

Abstract

Paper Structure (11 sections, 2 theorems, 14 equations, 8 figures, 2 tables)

This paper contains 11 sections, 2 theorems, 14 equations, 8 figures, 2 tables.

Introduction and Related Work
Unlearning Algorithms
Retraining
Amnesiac Unlearning
Fisher Forgetting
Measuring the Success of Forgetting
Evaluating Forgetting via Epistemic Uncertainty
Experiments
Experimental Setup
Results
Conclusion and Future Work

Key Result

theorem thmcountertheorem

Let $\mathcal{L}(\theta, D)$ be the cross-entropy loss. The squared gradient norm of the cross-entropy loss forms a lower bound for the information score:

Figures (8)

Figure 1: Distributions of efficacy scores (solid lines) and upper bounds (dashed lines) over all pre-trained models trained on the MNIST dataset (a) before and (b)-(d) after forgetting. Each distribution corresponds to a percentage of the target class. For reasons of readability we omit the percentage of 0.01. Both axes are log scaled.
Figure 2: Efficacy comparison w.r.t. the whole target class before training (Initial), after training (Pre-trained) and after forgetting (Retraining, Amnesiac Unlearning, Fisher Forgetting). Both axes are log scaled.
Figure 3: Distributions of membership inference attack mean probabilities over all pre-trained models trained on the MNIST dataset (a) before and (b)-(d) after forgetting. Each distribution corresponds to a percentage of the target class. Both axes are log scaled.
Figure 4: Log-log plot showing the relation between the efficacy and the membership inference attack mean probability (a) before and (b)-(d) after forgetting.
Figure 5: Distributions of efficacy scores (solid lines) and upper bounds (dashed lines) over all pre-trained models trained on the CIFAR10 dataset (a) before and (b)-(d) after forgetting. Each distribution corresponds to a percentage of the target class. For reasons of readability we omit the percentage of 0.01. Both axes are log scaled.
...and 3 more figures

Theorems & Definitions (3)

theorem thmcountertheorem
proof : \ref{['th:uncertainty_lower_bound']}
lemma thmcounterlemma

Evaluating Machine Unlearning via Epistemic Uncertainty

TL;DR

Abstract

Evaluating Machine Unlearning via Epistemic Uncertainty

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)