Unlearnable Examples Detection via Iterative Filtering

Yi Yu; Qichen Zheng; Siyuan Yang; Wenhan Yang; Jun Liu; Shijian Lu; Yap-Peng Tan; Kwok-Yan Lam; Alex Kot

Unlearnable Examples Detection via Iterative Filtering

Yi Yu, Qichen Zheng, Siyuan Yang, Wenhan Yang, Jun Liu, Shijian Lu, Yap-Peng Tan, Kwok-Yan Lam, Alex Kot

TL;DR

This work addresses the challenge of unlearnable examples (UEs) in data poisoning by introducing Iterative Filtering (IF), a detection framework that does not rely on extra signals. IF adds $C$ extra classes and iteratively refines sample labels, exploiting the observation that models tend to learn UE shortcuts faster than clean data, enabling clear separation over successive iterations. The method is validated across multiple datasets (e.g., CIFAR-10/100, ImageNet subset) and UE generation techniques, showing superior detection performance (reduced error rates) and robustness to varying poison ratios, with ablations confirming the value of extra classes and iterative retrieval. Additionally, IF can enhance purification-based defenses, demonstrating practical impact for data-centric AI security by enabling reliable UE identification and subsequent removal or purification of tainted samples.

Abstract

Deep neural networks are proven to be vulnerable to data poisoning attacks. Recently, a specific type of data poisoning attack known as availability attacks has led to the failure of data utilization for model learning by adding imperceptible perturbations to images. Consequently, it is quite beneficial and challenging to detect poisoned samples, also known as Unlearnable Examples (UEs), from a mixed dataset. In response, we propose an Iterative Filtering approach for UEs identification. This method leverages the distinction between the inherent semantic mapping rules and shortcuts, without the need for any additional information. We verify that when training a classifier on a mixed dataset containing both UEs and clean data, the model tends to quickly adapt to the UEs compared to the clean data. Due to the accuracy gaps between training with clean/poisoned samples, we employ a model to misclassify clean samples while correctly identifying the poisoned ones. The incorporation of additional classes and iterative refinement enhances the model's ability to differentiate between clean and poisoned samples. Extensive experiments demonstrate the superiority of our method over state-of-the-art detection approaches across various attacks, datasets, and poison ratios, significantly reducing the Half Total Error Rate (HTER) compared to existing methods.

Unlearnable Examples Detection via Iterative Filtering

TL;DR

This work addresses the challenge of unlearnable examples (UEs) in data poisoning by introducing Iterative Filtering (IF), a detection framework that does not rely on extra signals. IF adds

extra classes and iteratively refines sample labels, exploiting the observation that models tend to learn UE shortcuts faster than clean data, enabling clear separation over successive iterations. The method is validated across multiple datasets (e.g., CIFAR-10/100, ImageNet subset) and UE generation techniques, showing superior detection performance (reduced error rates) and robustness to varying poison ratios, with ablations confirming the value of extra classes and iterative retrieval. Additionally, IF can enhance purification-based defenses, demonstrating practical impact for data-centric AI security by enabling reliable UE identification and subsequent removal or purification of tainted samples.

Abstract

Paper Structure (15 sections, 5 equations, 5 figures, 5 tables)

This paper contains 15 sections, 5 equations, 5 figures, 5 tables.

Introduction
Related Work
Data poisoning
Existing Defense against UEs
Detection of Backdoor attacks
Preliminary
Methodology
Key Intuition
Iterative Filtering (IF) for UEs Detection
Experiments
Experimental Setup
Experimental Results
Ablation Study
Detection for Purification
Conclusion

Figures (5)

Figure 1: Test accuracy (%) on the unseen unlearnable data and clean data when training a classifier on a mixed dataset.
Figure 2: Test accuracy (%) on the unseen unlearnable data and clean data when training a classifier on a mixed dataset plus additional clean data with label set to $y+C$.
Figure 3: Performance Vs. Iterations on detecting EM with 80% poison ratio.
Figure 4: t-SNE tsne visualizations on CIFAR-10, comparing models trained without and with additional clean data, where the labels for the additional clean data are updated to $y\in[C,2C-1]$. Note that UEs are generated by EM, and the shared training data consists of 50% UEs and 50% clean data.
Figure :

Unlearnable Examples Detection via Iterative Filtering

TL;DR

Abstract

Unlearnable Examples Detection via Iterative Filtering

Authors

TL;DR

Abstract

Table of Contents

Figures (5)