Table of Contents
Fetching ...

ReVeil: Unconstrained Concealed Backdoor Attack on Deep Neural Networks using Machine Unlearning

Manaar Alam, Hithem Lamri, Michail Maniatakos

TL;DR

ReVeil introduces a data-collection–phase concealed backdoor attack that requires no access to the target model or auxiliary data. By injecting camouflage samples—poisoned samples perturbed with isotropic Gaussian noise—into the training data, it significantly lowers pre-deployment ASR while preserving backdoor potential, enabling stealth against common defenses. After deployment, an exact unlearning process removes camouflage and restores high ASR, with BA largely unaffected, demonstrating practical viability and resilience across multiple datasets and triggers. The work discusses extensions to multi-target backdoors, approximate unlearning, and potential defenses, highlighting important implications for ML security and data privacy.

Abstract

Backdoor attacks embed hidden functionalities in deep neural networks (DNN), triggering malicious behavior with specific inputs. Advanced defenses monitor anomalous DNN inferences to detect such attacks. However, concealed backdoors evade detection by maintaining a low pre-deployment attack success rate (ASR) and restoring high ASR post-deployment via machine unlearning. Existing concealed backdoors are often constrained by requiring white-box or black-box access or auxiliary data, limiting their practicality when such access or data is unavailable. This paper introduces ReVeil, a concealed backdoor attack targeting the data collection phase of the DNN training pipeline, requiring no model access or auxiliary data. ReVeil maintains low pre-deployment ASR across four datasets and four trigger patterns, successfully evades three popular backdoor detection methods, and restores high ASR post-deployment through machine unlearning.

ReVeil: Unconstrained Concealed Backdoor Attack on Deep Neural Networks using Machine Unlearning

TL;DR

ReVeil introduces a data-collection–phase concealed backdoor attack that requires no access to the target model or auxiliary data. By injecting camouflage samples—poisoned samples perturbed with isotropic Gaussian noise—into the training data, it significantly lowers pre-deployment ASR while preserving backdoor potential, enabling stealth against common defenses. After deployment, an exact unlearning process removes camouflage and restores high ASR, with BA largely unaffected, demonstrating practical viability and resilience across multiple datasets and triggers. The work discusses extensions to multi-target backdoors, approximate unlearning, and potential defenses, highlighting important implications for ML security and data privacy.

Abstract

Backdoor attacks embed hidden functionalities in deep neural networks (DNN), triggering malicious behavior with specific inputs. Advanced defenses monitor anomalous DNN inferences to detect such attacks. However, concealed backdoors evade detection by maintaining a low pre-deployment attack success rate (ASR) and restoring high ASR post-deployment via machine unlearning. Existing concealed backdoors are often constrained by requiring white-box or black-box access or auxiliary data, limiting their practicality when such access or data is unavailable. This paper introduces ReVeil, a concealed backdoor attack targeting the data collection phase of the DNN training pipeline, requiring no model access or auxiliary data. ReVeil maintains low pre-deployment ASR across four datasets and four trigger patterns, successfully evades three popular backdoor detection methods, and restores high ASR post-deployment through machine unlearning.

Paper Structure

This paper contains 7 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of ReVeil -- Data Poisoning: the adversary crafts both poison and camouflage samples; Trigger Injection: the poisoned data is submitted for model training; Backdoor Restoration: the adversary restores backdoor functionality by requesting unlearning of camouflage samples; and Backdoor Exploitation: the adversary uses trigger-embedded samples to cause misclassifications. Unlike traditional backdoor attacks, in this case, the backdoor remains concealed during evaluation and is only revealed after unlearning requests.
  • Figure 2: (Top Row) Randomly selected CIFAR10 images with 'BadNets' trigger; (Middle Row) GradCAM results for $f_{\theta}^{\mathcal{B}}$, showing strong focus on trigger; (Bottom Row) GradCAM results for $f_{\theta}^{\mathcal{N}}$, showing reduced trigger attention due to training with noisy poison samples.
  • Figure 3: ASR heatmaps for various attack methods and datasets across varying $c_{r}$ with $\sigma = 10^{-3}$.
  • Figure 4: BA and ASR for $\mathcal{A}_1$ across different datasets as a function of varying noise standard deviations $(\sigma)$ with $c_{r} = 5$.
  • Figure 5: BA and ASR performance comparison across three scenarios: poisoning (without camouflage), camouflaging (with ReVeil camouflage examples), and unlearning (after removing camouflage using unlearning) for different datasets and attack methods with $c_{r}=5$ and $\sigma=10^{-3}$.
  • ...and 3 more figures