Fairness and Robustness in Machine Unlearning
Khoa Tran, Simon S. Woo
TL;DR
This work tackles the privacy-driven problem of removing data influence in pretrained neural networks (machine unlearning) and shows that existing approximated methods can fail to guarantee exact unlearning and may impair fairness and robustness. They define a fairness-gap metric across normalization layers as $ \epsilon^l = \max_c \sigma_c^l - \min_c \sigma_c^l $ and propose two conjectures linking higher fairness-gap to reduced robustness. Through CIFAR-10 experiments on ResNet-50 and SmallViT, the authors demonstrate that approximated unlearning often degrades robustness and fairness, while unlearning in intermediate/last layers can achieve better efficiency without sacrificing performance. The work advocates using robustness metrics for unlearning evaluation and provides practical guidance for designing fairer, more robust, and computation-efficient unlearning procedures.
Abstract
Machine unlearning poses the challenge of ``how to eliminate the influence of specific data from a pretrained model'' in regard to privacy concerns. While prior research on approximated unlearning has demonstrated accuracy and efficiency in time complexity, we claim that it falls short of achieving exact unlearning, and we are the first to focus on fairness and robustness in machine unlearning algorithms. Our study presents fairness Conjectures for a well-trained model, based on the variance-bias trade-off characteristic, and considers their relevance to robustness. Our Conjectures are supported by experiments conducted on the two most widely used model architectures, ResNet and ViT, demonstrating the correlation between fairness and robustness: \textit{the higher fairness-gap is, the more the model is sensitive and vulnerable}. In addition, our experiments demonstrate the vulnerability of current state-of-the-art approximated unlearning algorithms to adversarial attacks, where their unlearned models suffer a significant drop in accuracy compared to the exact-unlearned models. We claim that our fairness-gap measurement and robustness metric should be used to evaluate the unlearning algorithm. Furthermore, we demonstrate that unlearning in the intermediate and last layers is sufficient and cost-effective for time and memory complexity.
