Table of Contents
Fetching ...

Learning-Based Difficulty Calibration for Enhanced Membership Inference Attacks

Haonan Shi, Tu Ouyang, An Wang

TL;DR

This work tackles the privacy risk of training data leakage via Membership Inference Attacks by introducing LDC-MIA, a learning-based difficulty-calibration attack. LDC-MIA jointly leverages the target-model membership score, a calibrated score from a reference model, class labels, and neighborhood information to train a lightweight MIA classifier, achieving up to 4x higher TPR at low FPR and superior AUC across multiple datasets. The approach reduces attacker cost by requiring only one shadow and one reference model, while remaining robust to data augmentation and offering insights from comprehensive ablations (privacy mechanisms, overfitting, data sizes, architectures, optimizers, and feature contributions). These results imply significant practical implications for auditing privacy leakage in ML systems and inform defense strategies against MIAs, highlighting a favorable balance between attack effectiveness and computational efficiency.

Abstract

Machine learning models, in particular deep neural networks, are currently an integral part of various applications, from healthcare to finance. However, using sensitive data to train these models raises concerns about privacy and security. One method that has emerged to verify if the trained models are privacy-preserving is Membership Inference Attacks (MIA), which allows adversaries to determine whether a specific data point was part of a model's training dataset. While a series of MIAs have been proposed in the literature, only a few can achieve high True Positive Rates (TPR) in the low False Positive Rate (FPR) region (0.01%~1%). This is a crucial factor to consider for an MIA to be practically useful in real-world settings. In this paper, we present a novel approach to MIA that is aimed at significantly improving TPR at low FPRs. Our method, named learning-based difficulty calibration for MIA(LDC-MIA), characterizes data records by their hardness levels using a neural network classifier to determine membership. The experiment results show that LDC-MIA can improve TPR at low FPR by up to 4x compared to the other difficulty calibration based MIAs. It also has the highest Area Under ROC curve (AUC) across all datasets. Our method's cost is comparable with most of the existing MIAs, but is orders of magnitude more efficient than one of the state-of-the-art methods, LiRA, while achieving similar performance.

Learning-Based Difficulty Calibration for Enhanced Membership Inference Attacks

TL;DR

This work tackles the privacy risk of training data leakage via Membership Inference Attacks by introducing LDC-MIA, a learning-based difficulty-calibration attack. LDC-MIA jointly leverages the target-model membership score, a calibrated score from a reference model, class labels, and neighborhood information to train a lightweight MIA classifier, achieving up to 4x higher TPR at low FPR and superior AUC across multiple datasets. The approach reduces attacker cost by requiring only one shadow and one reference model, while remaining robust to data augmentation and offering insights from comprehensive ablations (privacy mechanisms, overfitting, data sizes, architectures, optimizers, and feature contributions). These results imply significant practical implications for auditing privacy leakage in ML systems and inform defense strategies against MIAs, highlighting a favorable balance between attack effectiveness and computational efficiency.

Abstract

Machine learning models, in particular deep neural networks, are currently an integral part of various applications, from healthcare to finance. However, using sensitive data to train these models raises concerns about privacy and security. One method that has emerged to verify if the trained models are privacy-preserving is Membership Inference Attacks (MIA), which allows adversaries to determine whether a specific data point was part of a model's training dataset. While a series of MIAs have been proposed in the literature, only a few can achieve high True Positive Rates (TPR) in the low False Positive Rate (FPR) region (0.01%~1%). This is a crucial factor to consider for an MIA to be practically useful in real-world settings. In this paper, we present a novel approach to MIA that is aimed at significantly improving TPR at low FPRs. Our method, named learning-based difficulty calibration for MIA(LDC-MIA), characterizes data records by their hardness levels using a neural network classifier to determine membership. The experiment results show that LDC-MIA can improve TPR at low FPR by up to 4x compared to the other difficulty calibration based MIAs. It also has the highest Area Under ROC curve (AUC) across all datasets. Our method's cost is comparable with most of the existing MIAs, but is orders of magnitude more efficient than one of the state-of-the-art methods, LiRA, while achieving similar performance.
Paper Structure (34 sections, 8 equations, 35 figures, 9 tables)

This paper contains 34 sections, 8 equations, 35 figures, 9 tables.

Figures (35)

  • Figure 1: The figure shows data records of Airplane and Cat classes in the CIFAR-10 dataset. Each record is represented by a marker indicating its membership type. The y-axis shows the calibrated membership scores, and the x-axis shows the membership scores on the target model. The membership score is determined as the negative of cross-entropy loss values. The calibrated membership score is the difference between a data record's membership score on the target model and the reference model. In traditional MIAs, data samples with higher membership scores are more likely to be members, while in difficulty calibration-based MIAs, data samples with higher calibrated membership scores are more likely to be members.
  • Figure 2: Histogram of the calibrated membership scores of members and non-members. The calibrated membership scores correspond to those in Figure \ref{['fig:data-category']}.
  • Figure 3: Based on the calibrated membership scores and membership scores on the target model (VGG-16), we group the target samples from the CIFAR-10 dataset into five categories: hard-to-predict member/non-member, easy-to-predict member/non-member, and hard-to-calibrate non-member.
  • Figure 4: Compare TPR in the low FPR region ($< 0.01\%$) for attacks on the CIFAR-10 dataset using different membership scores.
  • Figure 5: Membership scores of data records with different labels in CIFAR-10.
  • ...and 30 more figures