Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning

Binhao Ma; Tianhang Zheng; Hongsheng Hu; Di Wang; Shuo Wang; Zhongjie Ba; Zhan Qin; Kui Ren

Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning

Binhao Ma, Tianhang Zheng, Hongsheng Hu, Di Wang, Shuo Wang, Zhongjie Ba, Zhan Qin, Kui Ren

TL;DR

This work tackles privacy risks in machine unlearning by introducing the Unlearning Usability Attack, where benign yet information-rich data are contributed to train a model and subsequently erased to crash the unlearned model’s utility. The method uses dataset condensation and distribution matching via Maximum Mean Discrepancy to synthesize Informative Benign Data, enabling effective attacks with as little as 1% of data and without modifying unlearned samples. Across multiple datasets and architectures, the attack degrades accuracy far more than unlearning normal data and remains challenging to detect against both passive and active defenses, exposing a critical vulnerability in MLaaS pipelines. The results motivate reevaluation of poisoning defenses in unlearning and highlight the need for robust unlearning mechanisms that balance privacy, performance, and resilience to such usability threats.

Abstract

Machine learning models trained on vast amounts of real or synthetic data often achieve outstanding predictive performance across various domains. However, this utility comes with increasing concerns about privacy, as the training data may include sensitive information. To address these concerns, machine unlearning has been proposed to erase specific data samples from models. While some unlearning techniques efficiently remove data at low costs, recent research highlights vulnerabilities where malicious users could request unlearning on manipulated data to compromise the model. Despite these attacks' effectiveness, perturbed data differs from original training data, failing hash verification. Existing attacks on machine unlearning also suffer from practical limitations and require substantial additional knowledge and resources. To fill the gaps in current unlearning attacks, we introduce the Unlearning Usability Attack. This model-agnostic, unlearning-agnostic, and budget-friendly attack distills data distribution information into a small set of benign data. These data are identified as benign by automatic poisoning detection tools due to their positive impact on model training. While benign for machine learning, unlearning these data significantly degrades model information. Our evaluation demonstrates that unlearning this benign data, comprising no more than 1% of the total training data, can reduce model accuracy by up to 50%. Furthermore, our findings show that well-prepared benign data poses challenges for recent unlearning techniques, as erasing these synthetic instances demands higher resources than regular data. These insights underscore the need for future research to reconsider "data poisoning" in the context of machine unlearning.

Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning

TL;DR

Abstract

Paper Structure (14 sections, 7 equations, 18 figures, 11 tables)

This paper contains 14 sections, 7 equations, 18 figures, 11 tables.

Introduction
Related Work
Threat Model
Methodology
Experimental Settings
Attack Performance
Analysis of Attack Effectiveness
Resistance Against Poisoning Defenses
Conclusion
Ablation Studies
Different Amounts of Informative Benign Data in Scenario 1.
Different Network Architectures.
Different Amounts of Informative Benign Data in Scenario 2.
The impact of two types of one image on unlearning

Figures (18)

Figure 1: An overview of the unlearning usability attack. An attacker first contributes data to train the model and then revokes the contribution to crash the model.
Figure 2: The three attack scenarios in unlearning usability attacks.
Figure 3: The distinction between normal data and informative benign data after the unlearning process.
Figure 4: Model performance comparison between normal and informative benign data under multiple rounds of unlearning.
Figure 5: In scenario 1, we assess the information of informative ('S') and normal ('N') data across different networks. We train both types of data on two separate networks and then compare their accuracies to determine their relative information.
...and 13 more figures

Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning

TL;DR

Abstract

Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning

Authors

TL;DR

Abstract

Table of Contents

Figures (18)