Table of Contents
Fetching ...

Filter, Obstruct and Dilute: Defending Against Backdoor Attacks on Semi-Supervised Learning

Xinrui Wang, Chuanxing Geng, Wenhai Wan, Shao-yuan Li, Songcan Chen

TL;DR

This work tackles the vulnerability of semi-supervised learning to backdoor data-poisoning by introducing the Backdoor Invalidator (BI), a plug-in defense that filters triggers with a Gaussian pre-processing step, obstructs trigger–class correlations via complementary learning with complementary labels, and dilutes backdoor effects through trigger mix-up and a two-stage training protocol. The authors establish theoretical guarantees showing that replacing the standard consistency loss with a complementary loss can recover the optimal classifier, and provide a generalization bound for the proposed approach. Empirically, BI markedly reduces backdoor attack success rates across multiple SSL methods and datasets while largely preserving clean-data accuracy, demonstrating strong practical robustness and adaptability. The method requires no explicit detection step and offers a principled, modular defense that can be integrated with existing SSL pipelines to mitigate backdoor risks in real-world deployments.

Abstract

Recent studies have verified that semi-supervised learning (SSL) is vulnerable to data poisoning backdoor attacks. Even a tiny fraction of contaminated training data is sufficient for adversaries to manipulate up to 90\% of the test outputs in existing SSL methods. Given the emerging threat of backdoor attacks designed for SSL, this work aims to protect SSL against such risks, marking it as one of the few known efforts in this area. Specifically, we begin by identifying that the spurious correlations between the backdoor triggers and the target class implanted by adversaries are the primary cause of manipulated model predictions during the test phase. To disrupt these correlations, we utilize three key techniques: Gaussian Filter, complementary learning and trigger mix-up, which collectively filter, obstruct and dilute the influence of backdoor attacks in both data pre-processing and feature learning. Experimental results demonstrate that our proposed method, Backdoor Invalidator (BI), significantly reduces the average attack success rate from 84.7\% to 1.8\% across different state-of-the-art backdoor attacks. It is also worth mentioning that BI does not sacrifice accuracy on clean data and is supported by a theoretical guarantee of its generalization capability.

Filter, Obstruct and Dilute: Defending Against Backdoor Attacks on Semi-Supervised Learning

TL;DR

This work tackles the vulnerability of semi-supervised learning to backdoor data-poisoning by introducing the Backdoor Invalidator (BI), a plug-in defense that filters triggers with a Gaussian pre-processing step, obstructs trigger–class correlations via complementary learning with complementary labels, and dilutes backdoor effects through trigger mix-up and a two-stage training protocol. The authors establish theoretical guarantees showing that replacing the standard consistency loss with a complementary loss can recover the optimal classifier, and provide a generalization bound for the proposed approach. Empirically, BI markedly reduces backdoor attack success rates across multiple SSL methods and datasets while largely preserving clean-data accuracy, demonstrating strong practical robustness and adaptability. The method requires no explicit detection step and offers a principled, modular defense that can be integrated with existing SSL pipelines to mitigate backdoor risks in real-world deployments.

Abstract

Recent studies have verified that semi-supervised learning (SSL) is vulnerable to data poisoning backdoor attacks. Even a tiny fraction of contaminated training data is sufficient for adversaries to manipulate up to 90\% of the test outputs in existing SSL methods. Given the emerging threat of backdoor attacks designed for SSL, this work aims to protect SSL against such risks, marking it as one of the few known efforts in this area. Specifically, we begin by identifying that the spurious correlations between the backdoor triggers and the target class implanted by adversaries are the primary cause of manipulated model predictions during the test phase. To disrupt these correlations, we utilize three key techniques: Gaussian Filter, complementary learning and trigger mix-up, which collectively filter, obstruct and dilute the influence of backdoor attacks in both data pre-processing and feature learning. Experimental results demonstrate that our proposed method, Backdoor Invalidator (BI), significantly reduces the average attack success rate from 84.7\% to 1.8\% across different state-of-the-art backdoor attacks. It is also worth mentioning that BI does not sacrifice accuracy on clean data and is supported by a theoretical guarantee of its generalization capability.

Paper Structure

This paper contains 32 sections, 6 theorems, 28 equations, 11 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Suppose that transition matrix $\mathbf{Q}$ is invertible and Assumption identity is satisfied, the minimizer $\bar{f}^*$ of $\bar{R}(f)$ coincides with the minimizer $f^*$ of $R(f)$, i.e., $\bar{f}^*=f^*$.

Figures (11)

  • Figure 1: Following previous settingsyan2021deep, poisoned data is exclusively introduced into the unlabeled set, as the labeled set is typically subjected to careful inspection. Our goal is to prevent adversaries from manipulating test data outputs from the true label to the targeted one under the poisoned dataset.
  • Figure 2: Visualization of the mechanism behind successful backdoor attacks in SSL from a casual perspective.
  • Figure 3: Visualization of two successful backdoor triggers (including Gaussian Filter) under different attack intensity shejwalkar2023perilswang2022invisible. For enhanced visualization, the trigger patterns in the second row are displayed with a $10\times$ intensity amplification.
  • Figure 4: The data mix-up does not compromise the trigger pattern, such that the trigger pattern becomes more associated with the class "horse" rather than target class "bird". Similar phenomenon also exists in many other backdoor attack triggers.
  • Figure 5: Sensitivity analysis on $\gamma$.
  • ...and 6 more figures

Theorems & Definitions (11)

  • Theorem 1
  • Theorem 2
  • Proof 1
  • Lemma 1
  • Proof 2
  • Lemma 2
  • Proof 3
  • Lemma 3
  • Proof 4
  • Theorem 3
  • ...and 1 more