Membership Inference Attack with Partial Features
Xurun Wang, Guangrui Liu, Xinjie Li, Haoyu He, Lin Yao, Zhongyun Hua, Weizhe Zhang
TL;DR
This work introduces Partial Feature Membership Inference (PFMI), a pragmatic threat model where an attacker only observes a subset of features and must decide if that subset was part of the training data. It proposes MRAD, a two-stage attack that first reconstructs missing features using memory-guided optimization and then applies anomaly detection to assess distributional conformity, applicable in both white-box and black-box settings. Empirical results across image and tabular datasets show that MRAD achieves meaningful performance even with substantial feature missingness (e.g., AUC around 0.75 on STL-10 with 60% missing) and remains effective under zeroth-order gradient estimation. The findings highlight notable privacy risks from partial information and emphasize the need for defenses that protect high-importance features and account for partial-data leakage.
Abstract
Machine learning models are vulnerable to membership inference attack, which can be used to determine whether a given sample appears in the training data. Most existing methods assume the attacker has full access to the features of the target sample. This assumption, however, does not hold in many real-world scenarios where only partial features are available, thereby limiting the applicability of these methods. In this work, we introduce Partial Feature Membership Inference (PFMI), a scenario where the adversary observes only partial features of each sample and aims to infer whether this observed subset was present in the training set. To address this problem, we propose MRAD (Memory-guided Reconstruction and Anomaly Detection), a two-stage attack framework that works in both white-box and black-box settings. In the first stage, MRAD leverages the latent memory of the target model to reconstruct the unknown features of the sample. We observe that when the known features are absent from the training set, the reconstructed sample deviates significantly from the true data distribution. Consequently, in the second stage, we use anomaly detection algorithms to measure the deviation between the reconstructed sample and the training data distribution, thereby determining whether the known features belong to a member of the training set. Empirical results demonstrate that MRAD is effective across various datasets, and maintains compatibility with off-the-shelf anomaly detection techniques. For example, on STL-10, our attack exceeds an AUC of around 0.75 even with 60% of the missing features.
