Table of Contents
Fetching ...

Hide in Plain Sight: Clean-Label Backdoor for Auditing Membership Inference

Depeng Chen, Hao Chen, Hulin Jin, Jie Cui, Hong Zhong

TL;DR

This work proposes a novel clean-label backdoor-based approach for MIAs, designed specifically for robust and stealthy data auditing, that enables robust data auditing through black-box access and addresses challenges related to trigger stealthiness and poisoning durability.

Abstract

Membership inference attacks (MIAs) are critical tools for assessing privacy risks and ensuring compliance with regulations like the General Data Protection Regulation (GDPR). However, their potential for auditing unauthorized use of data remains under explored. To bridge this gap, we propose a novel clean-label backdoor-based approach for MIAs, designed specifically for robust and stealthy data auditing. Unlike conventional methods that rely on detectable poisoned samples with altered labels, our approach retains natural labels, enhancing stealthiness even at low poisoning rates. Our approach employs an optimal trigger generated by a shadow model that mimics the target model's behavior. This design minimizes the feature-space distance between triggered samples and the source class while preserving the original data labels. The result is a powerful and undetectable auditing mechanism that overcomes limitations of existing approaches, such as label inconsistencies and visual artifacts in poisoned samples. The proposed method enables robust data auditing through black-box access, achieving high attack success rates across diverse datasets and model architectures. Additionally, it addresses challenges related to trigger stealthiness and poisoning durability, establishing itself as a practical and effective solution for data auditing. Comprehensive experiments validate the efficacy and generalizability of our approach, outperforming several baseline methods in both stealth and attack success metrics.

Hide in Plain Sight: Clean-Label Backdoor for Auditing Membership Inference

TL;DR

This work proposes a novel clean-label backdoor-based approach for MIAs, designed specifically for robust and stealthy data auditing, that enables robust data auditing through black-box access and addresses challenges related to trigger stealthiness and poisoning durability.

Abstract

Membership inference attacks (MIAs) are critical tools for assessing privacy risks and ensuring compliance with regulations like the General Data Protection Regulation (GDPR). However, their potential for auditing unauthorized use of data remains under explored. To bridge this gap, we propose a novel clean-label backdoor-based approach for MIAs, designed specifically for robust and stealthy data auditing. Unlike conventional methods that rely on detectable poisoned samples with altered labels, our approach retains natural labels, enhancing stealthiness even at low poisoning rates. Our approach employs an optimal trigger generated by a shadow model that mimics the target model's behavior. This design minimizes the feature-space distance between triggered samples and the source class while preserving the original data labels. The result is a powerful and undetectable auditing mechanism that overcomes limitations of existing approaches, such as label inconsistencies and visual artifacts in poisoned samples. The proposed method enables robust data auditing through black-box access, achieving high attack success rates across diverse datasets and model architectures. Additionally, it addresses challenges related to trigger stealthiness and poisoning durability, establishing itself as a practical and effective solution for data auditing. Comprehensive experiments validate the efficacy and generalizability of our approach, outperforming several baseline methods in both stealth and attack success metrics.

Paper Structure

This paper contains 32 sections, 5 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: The process of membership inference attack under black-box and white-box attacks. This illustration highlights the differences in attack methods based on the level of knowledge available to the attacker: black-box attacks rely on the model’s outputs, while white-box attacks utilize full access to the model’s internals.
  • Figure 2: Examples of poisoned samples for the target class "cat". Left: Original sample. Middle: Dirty-label attack with a visible trigger (white square in the bottom right corner). Right: clean-label attack data sample Clean-label attack sample with perturbation limit $\epsilon$ = 16/255. We provide the label of each sample on the top. For the clean-label poisoning samples, they look like natural ones. Our approach overcomes the drawbacks of easy discovery of triggers and inconsistent labels.
  • Figure 3: An illustration of membership inference via a clean-label backdoor approach. The source label is "dog", the target label is "cat", and the trigger is trained by the shadow model. The attack method can be divided into three stages: a) The data owner trains the shadow model and uses the shadow model to train the trigger; b) Model training. Inject poisoned samples into the training set for model training; c) The data owner performs membership inference by querying the target model.
  • Figure 4: ASR under different numbers of poisoned samples. We plot the curves on ResNet18 using the CIFAR-10 dataset.
  • Figure 5: Visualization of latent space features extracted from the ResNet18 feature extractor for different poisoning CIFAR-10 classifiers. The source category is 3 (cat) and the target category is 2 (bird). (a) Dirty label attack on untrained ResNet18. (b) Clean label attack on untrained ResNet18. (c) Clean label attack against trained clean ResNet18. (d) Clean label attack on trained poisoned ResNet18. Colored points are clean training samples, while dark star markers are poisoned samples.
  • ...and 3 more figures