Defending Against Repetitive Backdoor Attacks on Semi-supervised Learning through Lens of Rate-Distortion-Perception Trade-off

Cheng-Yi Lee; Ching-Chia Kao; Cheng-Han Yeh; Chun-Shien Lu; Chia-Mu Yu; Chu-Song Chen

Defending Against Repetitive Backdoor Attacks on Semi-supervised Learning through Lens of Rate-Distortion-Perception Trade-off

Cheng-Yi Lee, Ching-Chia Kao, Cheng-Han Yeh, Chun-Shien Lu, Chia-Mu Yu, Chu-Song Chen

TL;DR

This paper addresses backdoor attacks in semi-supervised learning by defending unlabeled data without requiring clean labeled data. The core idea, UPure, purifies unlabeled data in the frequency domain by perturbing high-frequency DCT components within a region sized by $\tau\times\tau$, guided by Rate-Distortion-Perception (RDP) trade-offs. The authors provide a theoretical justification for the perturbation region and demonstrate that UPure markedly reduces attack success rates to near zero across multiple SSL algorithms and datasets, while preserving benign accuracy. Empirically, UPure outperforms several state-of-the-art defenses and remains effective against repetitive and visible/invisible triggers, highlighting its practical impact for secure SSL in real-world data pipelines.

Abstract

Semi-supervised learning (SSL) has achieved remarkable performance with a small fraction of labeled data by leveraging vast amounts of unlabeled data from the Internet. However, this large pool of untrusted data is extremely vulnerable to data poisoning, leading to potential backdoor attacks. Current backdoor defenses are not yet effective against such a vulnerability in SSL. In this study, we propose a novel method, Unlabeled Data Purification (UPure), to disrupt the association between trigger patterns and target classes by introducing perturbations in the frequency domain. By leveraging the Rate-Distortion-Perception (RDP) trade-off, we further identify the frequency band, where the perturbations are added, and justify this selection. Notably, UPure purifies poisoned unlabeled data without the need of extra clean labeled data. Extensive experiments on four benchmark datasets and five SSL algorithms demonstrate that UPure effectively reduces the attack success rate from 99.78% to 0% while maintaining model accuracy. Code is available here: \url{https://github.com/chengyi-chris/UPure}.

Defending Against Repetitive Backdoor Attacks on Semi-supervised Learning through Lens of Rate-Distortion-Perception Trade-off

TL;DR

, guided by Rate-Distortion-Perception (RDP) trade-offs. The authors provide a theoretical justification for the perturbation region and demonstrate that UPure markedly reduces attack success rates to near zero across multiple SSL algorithms and datasets, while preserving benign accuracy. Empirically, UPure outperforms several state-of-the-art defenses and remains effective against repetitive and visible/invisible triggers, highlighting its practical impact for secure SSL in real-world data pipelines.

Abstract

Paper Structure (40 sections, 4 theorems, 16 equations, 11 figures, 10 tables, 1 algorithm)

This paper contains 40 sections, 4 theorems, 16 equations, 11 figures, 10 tables, 1 algorithm.

Introduction
Unlabeled data poisoning
Our Backdoor Defense on SSL
Related Work
Semi-Supervised Learning
Backdoor Attack
Backdoor Defense
Rate-Distortion-Perception Trade-off
Our Proposed Backdoor Defense
Problem Formulation
Unlabeled data Purification: UPure
Theoretical Analysis
Rate-Distortion-Perception trade-offs
Destroy Trigger Patterns
Experiments
...and 25 more sections

Key Result

Lemma 1

Consider an image of size $\mathsf{H} \times \mathsf{W}$ with a single trigger pattern bounded by a rectangle $\mathsf{H}_t \times \mathsf{W}_t$ and the cutout devries2017improved region is $\mathsf{H}_c \times \mathsf{W}_c$, where $\mathsf{H} > \mathsf{H}_c \ge \mathsf{H}_t$ and $\mathsf{W} > \math where $\Phi(w) = \mathsf{H}_t - \frac{\alpha}{(\mathsf{W}_t - w)} = h$ and $1 \leq \alpha \leq \mat

Figures (11)

Figure 1: Illustration of UPure: (a) A benign sample can be turned into a malicious one by adding a backdoor trigger that moves it outside the image manifold and then projects onto the image manifold by clipping it to become a poisoned sample. (b) A local mapping that maps the benign and poisoned samples to 2D space. Points A, B, and C represent three strategies in the frequency purification step in UPure (\ref{['sec:3.2']}), which renders the poisoned sample ineffective.
Figure 2: Connection between RDP and our backdoor defense method on SSL. Specifically, UPure perturbs the high-frequency component in DCT spectrum to preserve the perceptual quality. By the RDP trade-off, we can derive the size of the perturbation zone (i.e. distortion) to eliminate the effect of repetitive backdoor.
Figure 3: Visualize RDP function of UPure on CIFAR10.
Figure 4: The failure probability $p_f^{single}$ with different $\mathsf{H}_c$, $\mathsf{W}_c$, $\mathsf{H}_t$, $\mathsf{W}_t$, and $\alpha$.
Figure 5: The failure probability $p_f^{repet}$ with different $q_k$ and $\beta$.
...and 6 more figures

Theorems & Definitions (6)

Definition 1
Lemma 1
Lemma 2
Theorem 1
proof
Theorem 2

Defending Against Repetitive Backdoor Attacks on Semi-supervised Learning through Lens of Rate-Distortion-Perception Trade-off

TL;DR

Abstract

Defending Against Repetitive Backdoor Attacks on Semi-supervised Learning through Lens of Rate-Distortion-Perception Trade-off

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (6)