Table of Contents
Fetching ...

Defending Against Repetitive Backdoor Attacks on Semi-supervised Learning through Lens of Rate-Distortion-Perception Trade-off

Cheng-Yi Lee, Ching-Chia Kao, Cheng-Han Yeh, Chun-Shien Lu, Chia-Mu Yu, Chu-Song Chen

TL;DR

This paper addresses backdoor attacks in semi-supervised learning by defending unlabeled data without requiring clean labeled data. The core idea, UPure, purifies unlabeled data in the frequency domain by perturbing high-frequency DCT components within a region sized by $\tau\times\tau$, guided by Rate-Distortion-Perception (RDP) trade-offs. The authors provide a theoretical justification for the perturbation region and demonstrate that UPure markedly reduces attack success rates to near zero across multiple SSL algorithms and datasets, while preserving benign accuracy. Empirically, UPure outperforms several state-of-the-art defenses and remains effective against repetitive and visible/invisible triggers, highlighting its practical impact for secure SSL in real-world data pipelines.

Abstract

Semi-supervised learning (SSL) has achieved remarkable performance with a small fraction of labeled data by leveraging vast amounts of unlabeled data from the Internet. However, this large pool of untrusted data is extremely vulnerable to data poisoning, leading to potential backdoor attacks. Current backdoor defenses are not yet effective against such a vulnerability in SSL. In this study, we propose a novel method, Unlabeled Data Purification (UPure), to disrupt the association between trigger patterns and target classes by introducing perturbations in the frequency domain. By leveraging the Rate-Distortion-Perception (RDP) trade-off, we further identify the frequency band, where the perturbations are added, and justify this selection. Notably, UPure purifies poisoned unlabeled data without the need of extra clean labeled data. Extensive experiments on four benchmark datasets and five SSL algorithms demonstrate that UPure effectively reduces the attack success rate from 99.78% to 0% while maintaining model accuracy. Code is available here: \url{https://github.com/chengyi-chris/UPure}.

Defending Against Repetitive Backdoor Attacks on Semi-supervised Learning through Lens of Rate-Distortion-Perception Trade-off

TL;DR

This paper addresses backdoor attacks in semi-supervised learning by defending unlabeled data without requiring clean labeled data. The core idea, UPure, purifies unlabeled data in the frequency domain by perturbing high-frequency DCT components within a region sized by , guided by Rate-Distortion-Perception (RDP) trade-offs. The authors provide a theoretical justification for the perturbation region and demonstrate that UPure markedly reduces attack success rates to near zero across multiple SSL algorithms and datasets, while preserving benign accuracy. Empirically, UPure outperforms several state-of-the-art defenses and remains effective against repetitive and visible/invisible triggers, highlighting its practical impact for secure SSL in real-world data pipelines.

Abstract

Semi-supervised learning (SSL) has achieved remarkable performance with a small fraction of labeled data by leveraging vast amounts of unlabeled data from the Internet. However, this large pool of untrusted data is extremely vulnerable to data poisoning, leading to potential backdoor attacks. Current backdoor defenses are not yet effective against such a vulnerability in SSL. In this study, we propose a novel method, Unlabeled Data Purification (UPure), to disrupt the association between trigger patterns and target classes by introducing perturbations in the frequency domain. By leveraging the Rate-Distortion-Perception (RDP) trade-off, we further identify the frequency band, where the perturbations are added, and justify this selection. Notably, UPure purifies poisoned unlabeled data without the need of extra clean labeled data. Extensive experiments on four benchmark datasets and five SSL algorithms demonstrate that UPure effectively reduces the attack success rate from 99.78% to 0% while maintaining model accuracy. Code is available here: \url{https://github.com/chengyi-chris/UPure}.
Paper Structure (40 sections, 4 theorems, 16 equations, 11 figures, 10 tables, 1 algorithm)

This paper contains 40 sections, 4 theorems, 16 equations, 11 figures, 10 tables, 1 algorithm.

Key Result

Lemma 1

Consider an image of size $\mathsf{H} \times \mathsf{W}$ with a single trigger pattern bounded by a rectangle $\mathsf{H}_t \times \mathsf{W}_t$ and the cutout devries2017improved region is $\mathsf{H}_c \times \mathsf{W}_c$, where $\mathsf{H} > \mathsf{H}_c \ge \mathsf{H}_t$ and $\mathsf{W} > \math where $\Phi(w) = \mathsf{H}_t - \frac{\alpha}{(\mathsf{W}_t - w)} = h$ and $1 \leq \alpha \leq \mat

Figures (11)

  • Figure 1: Illustration of UPure: (a) A benign sample can be turned into a malicious one by adding a backdoor trigger that moves it outside the image manifold and then projects onto the image manifold by clipping it to become a poisoned sample. (b) A local mapping that maps the benign and poisoned samples to 2D space. Points A, B, and C represent three strategies in the frequency purification step in UPure (\ref{['sec:3.2']}), which renders the poisoned sample ineffective.
  • Figure 2: Connection between RDP and our backdoor defense method on SSL. Specifically, UPure perturbs the high-frequency component in DCT spectrum to preserve the perceptual quality. By the RDP trade-off, we can derive the size of the perturbation zone (i.e. distortion) to eliminate the effect of repetitive backdoor.
  • Figure 3: Visualize RDP function of UPure on CIFAR10.
  • Figure 4: The failure probability $p_f^{single}$ with different $\mathsf{H}_c$, $\mathsf{W}_c$, $\mathsf{H}_t$, $\mathsf{W}_t$, and $\alpha$.
  • Figure 5: The failure probability $p_f^{repet}$ with different $q_k$ and $\beta$.
  • ...and 6 more figures

Theorems & Definitions (6)

  • Definition 1
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • proof
  • Theorem 2