Evaluating Machine Unlearning via Epistemic Uncertainty
Alexander Becker, Thomas Liebig
TL;DR
This work tackles the challenge of evaluating Machine Unlearning by proposing an epistemic-uncertainty–based metric. It defines the information score $\\imath(\\theta; D) = \\mathrm{tr}(\\mathcal{I}(\\theta; D))$ via the empirical Fisher Information and derives an efficacy measure $\\text{efficacy}(\\theta; D) = 1/\\imath(\\theta; D)$, along with a computable upper bound $\\text{efficacy}(\\theta; D) \\le 1/\\|\\nabla \\mathcal{L}(\\theta, D)\\|_2^2$ that avoids full data re-processing. The authors compare three forgetting approaches—Retraining, Amnesiac Unlearning, and Fisher Forgetting—on MNIST and CIFAR-10, showing that decreasing accuracy or defeating adversarial attacks alone does not guarantee removal of sensitive information. Results reveal that updating directions matter: retraining and Fisher Forgetting tend toward reduced information exposure, while Amnesiac Unlearning can drift the model back toward the original state, highlighting the need for multiple, complementary metrics. Overall, the paper provides a practical, scalable framework for evaluating forgetting and outlines directions for broader surveys and connections to privacy guarantees such as certified removal and differential privacy.
Abstract
There has been a growing interest in Machine Unlearning recently, primarily due to legal requirements such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act. Thus, multiple approaches were presented to remove the influence of specific target data points from a trained model. However, when evaluating the success of unlearning, current approaches either use adversarial attacks or compare their results to the optimal solution, which usually incorporates retraining from scratch. We argue that both ways are insufficient in practice. In this work, we present an evaluation metric for Machine Unlearning algorithms based on epistemic uncertainty. This is the first definition of a general evaluation metric for Machine Unlearning to our best knowledge.
