Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks

Yu He; Boheng Li; Yao Wang; Mengda Yang; Juan Wang; Hongxin Hu; Xingyu Zhao

Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks

Yu He, Boheng Li, Yao Wang, Mengda Yang, Juan Wang, Hongxin Hu, Xingyu Zhao

TL;DR

This work challenges the dominance of difficulty calibration in high-precision membership inference attacks by exposing its limitations and proposing RAPID, a shortcut that directly reuses original membership scores to correct calibration errors. RAPID trains a shadow model and a small set of reference models to generate both original and calibrated signals, then learns a scoring function to map these signals to final membership scores, achieving strong performance with far lower computational cost than prior methods. Empirical results across 9 datasets and 5 architectures (and preliminary LLM experiments) show RAPID surpasses state-of-the-art offline attacks in key metrics (TPR@0.1% FPR, AUC, Balanced Accuracy) while reducing query and training costs by large factors. The findings highlight persistent privacy risks in practical scenarios and suggest a new direction for evaluating and mitigating membership leakage beyond traditional difficulty calibration.

Abstract

The vulnerability of machine learning models to Membership Inference Attacks (MIAs) has garnered considerable attention in recent years. These attacks determine whether a data sample belongs to the model's training set or not. Recent research has focused on reference-based attacks, which leverage difficulty calibration with independently trained reference models. While empirical studies have demonstrated its effectiveness, there is a notable gap in our understanding of the circumstances under which it succeeds or fails. In this paper, we take a further step towards a deeper understanding of the role of difficulty calibration. Our observations reveal inherent limitations in calibration methods, leading to the misclassification of non-members and suboptimal performance, particularly on high-loss samples. We further identify that these errors stem from an imperfect sampling of the potential distribution and a strong dependence of membership scores on the model parameters. By shedding light on these issues, we propose RAPID: a query-efficient and computation-efficient MIA that directly \textbf{R}e-lever\textbf{A}ges the original membershi\textbf{P} scores to m\textbf{I}tigate the errors in \textbf{D}ifficulty calibration. Our experimental results, spanning 9 datasets and 5 model architectures, demonstrate that RAPID outperforms previous state-of-the-art attacks (e.g., LiRA and Canary offline) across different metrics while remaining computationally efficient. Our observations and analysis challenge the current de facto paradigm of difficulty calibration in high-precision inference, encouraging greater attention to the persistent risks posed by MIAs in more practical scenarios.

Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks

TL;DR

Abstract

Paper Structure (52 sections, 7 equations, 17 figures, 16 tables)

This paper contains 52 sections, 7 equations, 17 figures, 16 tables.

Introduction
Background
Machine Learning
Membership Inference Attacks
Definition of MIAs.
Attack Method.
Metrics.
Rethinking Difficulty Calibration
Difficulty Calibration.
Limitations.
Design Intuition.
Attack Methodology
Threat Model
Attack Method
Shadow Model Training.
...and 37 more sections

Figures (17)

Figure 1: The distribution of raw membership scores and calibrated membership scores. All the samples with different losses obtained from the target VGG16 model are divided into three ranges: 'small loss'[0,0.002), 'medium loss'[0.002,1), and 'large loss'[1,$\infty$). The target model is trained on the CIFAR-10 dataset. Difficulty calibration significantly increases the membership scores of some non-member samples that originally had medium or large losses.
Figure 2: General attack pipeline of our RAPID.
Figure 3: The frequency distributions of final scores and calibrated scores, which were sampled using a VGG16 model trained on CIFAR-10.
Figure 4: The ROC curves of attack results on VGG16 models trained on four benchmark datasets.
Figure 5: Attack performance of prior works and our attack against a DenseNet121 model trained on CIFAR-10 using DP-SGD. The noise multiplier $\sigma$ is set to 0.1. Additional attack results for other $\sigma$ can be found in Appendix.\ref{['sec:Attacking DP-SGD']}.
...and 12 more figures

Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks

TL;DR

Abstract

Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (17)