Table of Contents
Fetching ...

A Method to Facilitate Membership Inference Attacks in Deep Learning Models

Zitao Chen, Karthik Pattabiraman

TL;DR

The paper presents a novel code-poisoning–driven membership inference attack that operates with black-box model access by embedding membership information into secret synthetic samples memorized by the model. It shows that, with a complete attack including a dual normalization scheme, the adversary can achieve near-perfect membership leakage across all training samples while preserving model accuracy, and can conceal this leakage from standard privacy audits. The results, demonstrated across multiple datasets and architectures, indicate that existing auditing and defenses are insufficient against such backdoor-like leakage, highlighting a critical gap in current privacy protections. The work thus emphasizes the need to rethink privacy auditing in ML systems, improve training-code integrity, and develop defenses that address code-poisoning–driven attacks.

Abstract

Modern machine learning (ML) ecosystems offer a surging number of ML frameworks and code repositories that can greatly facilitate the development of ML models. Today, even ordinary data holders who are not ML experts can apply off-the-shelf codebase to build high-performance ML models on their data, many of which are sensitive in nature (e.g., clinical records). In this work, we consider a malicious ML provider who supplies model-training code to the data holders, does not have access to the training process, and has only black-box query access to the resulting model. In this setting, we demonstrate a new form of membership inference attack that is strictly more powerful than prior art. Our attack empowers the adversary to reliably de-identify all the training samples (average >99% attack TPR@0.1% FPR), and the compromised models still maintain competitive performance as their uncorrupted counterparts (average <1% accuracy drop). Moreover, we show that the poisoned models can effectively disguise the amplified membership leakage under common membership privacy auditing, which can only be revealed by a set of secret samples known by the adversary. Overall, our study not only points to the worst-case membership privacy leakage, but also unveils a common pitfall underlying existing privacy auditing methods, which calls for future efforts to rethink the current practice of auditing membership privacy in machine learning models.

A Method to Facilitate Membership Inference Attacks in Deep Learning Models

TL;DR

The paper presents a novel code-poisoning–driven membership inference attack that operates with black-box model access by embedding membership information into secret synthetic samples memorized by the model. It shows that, with a complete attack including a dual normalization scheme, the adversary can achieve near-perfect membership leakage across all training samples while preserving model accuracy, and can conceal this leakage from standard privacy audits. The results, demonstrated across multiple datasets and architectures, indicate that existing auditing and defenses are insufficient against such backdoor-like leakage, highlighting a critical gap in current privacy protections. The work thus emphasizes the need to rethink privacy auditing in ML systems, improve training-code integrity, and develop defenses that address code-poisoning–driven attacks.

Abstract

Modern machine learning (ML) ecosystems offer a surging number of ML frameworks and code repositories that can greatly facilitate the development of ML models. Today, even ordinary data holders who are not ML experts can apply off-the-shelf codebase to build high-performance ML models on their data, many of which are sensitive in nature (e.g., clinical records). In this work, we consider a malicious ML provider who supplies model-training code to the data holders, does not have access to the training process, and has only black-box query access to the resulting model. In this setting, we demonstrate a new form of membership inference attack that is strictly more powerful than prior art. Our attack empowers the adversary to reliably de-identify all the training samples (average >99% attack TPR@0.1% FPR), and the compromised models still maintain competitive performance as their uncorrupted counterparts (average <1% accuracy drop). Moreover, we show that the poisoned models can effectively disguise the amplified membership leakage under common membership privacy auditing, which can only be revealed by a set of secret samples known by the adversary. Overall, our study not only points to the worst-case membership privacy leakage, but also unveils a common pitfall underlying existing privacy auditing methods, which calls for future efforts to rethink the current practice of auditing membership privacy in machine learning models.
Paper Structure (37 sections, 2 equations, 22 figures, 4 tables, 1 algorithm)

This paper contains 37 sections, 2 equations, 22 figures, 4 tables, 1 algorithm.

Figures (22)

  • Figure 1: Code-poisoned model exhibits similar accuracy and MIA risk (under standard MIA evaluation) as the uncorrupted model, while allowing the black-box adversary to secretly de-identify all training samples (example from CIFAR10).
  • Figure 2: Analyzing the trade off between preserving high model accuracy and inflicting high privacy leakage tramer2022truth.
  • Figure 3: Loss-value computation function. Our attack creates a secret membership-encoding sample ($x^*$) from each training sample, both of which have the same membership. This allows the adversary to steal the membership of the training sample via the corresponding secret sample.
  • Figure 4: Standard and stealthy membership inference procedure. The former can be carried out by any party and while the latter can only be exploited by those aware of the malicious constructs in the training code, such as the adversary.
  • Figure 5: Visualizing the normalization statistics estimated by the normalization layer on different types of inputs. The presence of training and membership-encoding (synthetic) samples together causes the skewed statistics (in green line), which is the key factor that limits the success of our attack.
  • ...and 17 more figures