Table of Contents
Fetching ...

Membership Inference Attack with Partial Features

Xurun Wang, Guangrui Liu, Xinjie Li, Haoyu He, Lin Yao, Zhongyun Hua, Weizhe Zhang

TL;DR

This work introduces Partial Feature Membership Inference (PFMI), a pragmatic threat model where an attacker only observes a subset of features and must decide if that subset was part of the training data. It proposes MRAD, a two-stage attack that first reconstructs missing features using memory-guided optimization and then applies anomaly detection to assess distributional conformity, applicable in both white-box and black-box settings. Empirical results across image and tabular datasets show that MRAD achieves meaningful performance even with substantial feature missingness (e.g., AUC around 0.75 on STL-10 with 60% missing) and remains effective under zeroth-order gradient estimation. The findings highlight notable privacy risks from partial information and emphasize the need for defenses that protect high-importance features and account for partial-data leakage.

Abstract

Machine learning models are vulnerable to membership inference attack, which can be used to determine whether a given sample appears in the training data. Most existing methods assume the attacker has full access to the features of the target sample. This assumption, however, does not hold in many real-world scenarios where only partial features are available, thereby limiting the applicability of these methods. In this work, we introduce Partial Feature Membership Inference (PFMI), a scenario where the adversary observes only partial features of each sample and aims to infer whether this observed subset was present in the training set. To address this problem, we propose MRAD (Memory-guided Reconstruction and Anomaly Detection), a two-stage attack framework that works in both white-box and black-box settings. In the first stage, MRAD leverages the latent memory of the target model to reconstruct the unknown features of the sample. We observe that when the known features are absent from the training set, the reconstructed sample deviates significantly from the true data distribution. Consequently, in the second stage, we use anomaly detection algorithms to measure the deviation between the reconstructed sample and the training data distribution, thereby determining whether the known features belong to a member of the training set. Empirical results demonstrate that MRAD is effective across various datasets, and maintains compatibility with off-the-shelf anomaly detection techniques. For example, on STL-10, our attack exceeds an AUC of around 0.75 even with 60% of the missing features.

Membership Inference Attack with Partial Features

TL;DR

This work introduces Partial Feature Membership Inference (PFMI), a pragmatic threat model where an attacker only observes a subset of features and must decide if that subset was part of the training data. It proposes MRAD, a two-stage attack that first reconstructs missing features using memory-guided optimization and then applies anomaly detection to assess distributional conformity, applicable in both white-box and black-box settings. Empirical results across image and tabular datasets show that MRAD achieves meaningful performance even with substantial feature missingness (e.g., AUC around 0.75 on STL-10 with 60% missing) and remains effective under zeroth-order gradient estimation. The findings highlight notable privacy risks from partial information and emphasize the need for defenses that protect high-importance features and account for partial-data leakage.

Abstract

Machine learning models are vulnerable to membership inference attack, which can be used to determine whether a given sample appears in the training data. Most existing methods assume the attacker has full access to the features of the target sample. This assumption, however, does not hold in many real-world scenarios where only partial features are available, thereby limiting the applicability of these methods. In this work, we introduce Partial Feature Membership Inference (PFMI), a scenario where the adversary observes only partial features of each sample and aims to infer whether this observed subset was present in the training set. To address this problem, we propose MRAD (Memory-guided Reconstruction and Anomaly Detection), a two-stage attack framework that works in both white-box and black-box settings. In the first stage, MRAD leverages the latent memory of the target model to reconstruct the unknown features of the sample. We observe that when the known features are absent from the training set, the reconstructed sample deviates significantly from the true data distribution. Consequently, in the second stage, we use anomaly detection algorithms to measure the deviation between the reconstructed sample and the training data distribution, thereby determining whether the known features belong to a member of the training set. Empirical results demonstrate that MRAD is effective across various datasets, and maintains compatibility with off-the-shelf anomaly detection techniques. For example, on STL-10, our attack exceeds an AUC of around 0.75 even with 60% of the missing features.

Paper Structure

This paper contains 23 sections, 4 equations, 13 figures, 1 table, 2 algorithms.

Figures (13)

  • Figure 1: Our goal is to infer membership information when only partial features of a sample are available.
  • Figure 2: Attack framework:We first implement a simple feature reconstruction algorithm to obtain a complete sample, and then distinguish between member and non-member features based on their deviation distance.
  • Figure 3: Attack performance (AUC) under varying known feature proportions. The x-axis represents the percentage of known features, and the y-axis shows the corresponding attack AUC. Each curve corresponds to one of the four integrated anomaly detection methods within our attack framework. We denote the case where only one feature is unknown as “loo”.
  • Figure 4: Attack performance (TPR@0.1FPR) under varying known feature proportions. The x-axis represents the percentage of known features, and the y-axis shows the corresponding attack TPR when TPR=0.1. Each curve corresponds to one of the four integrated anomaly detection methods within our attack framework. We denote the case where only one feature is unknown as “loo”.
  • Figure 5: Attack performance on the CIFAR-10 dataset under different parameter settings. Darker blue indicates higher attack performance. The number in each cell denotes half of the range (i.e., half the max–min difference) across ten runs.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2