Table of Contents
Fetching ...

A Unified Perspective on Adversarial Membership Manipulation in Vision Models

Ruize Gao, Kaiwen Zhou, Yongqiang Chen, Feng Liu

Abstract

Membership inference attacks (MIAs) aim to determine whether a specific data point was part of a model's training set, serving as effective tools for evaluating privacy leakage of vision models. However, existing MIAs implicitly assume honest query inputs, and their adversarial robustness remains unexplored. We show that MIAs for vision models expose a previously overlooked adversarial surface: adversarial membership manipulation, where imperceptible perturbations can reliably push non-member images into the "member" region of state-of-the-art MIAs. In this paper, we provide the first unified perspective on this phenomenon by analyzing its mechanism and implications. We begin by demonstrating that adversarial membership fabrication is consistently effective across diverse architectures and datasets. We then reveal a distinctive geometric signature - a characteristic gradient-norm collapse trajectory - that reliably separates fabricated from true members despite their nearly identical semantic representations. Building on this insight, we introduce a principled detection strategy grounded in gradient-geometry signals and develop a robust inference framework that substantially mitigates adversarial manipulation. Extensive experiments show that fabrication is broadly effective, while our detection and robust inference strategies significantly enhance resilience. This work establishes the first comprehensive framework for adversarial membership manipulation in vision models.

A Unified Perspective on Adversarial Membership Manipulation in Vision Models

Abstract

Membership inference attacks (MIAs) aim to determine whether a specific data point was part of a model's training set, serving as effective tools for evaluating privacy leakage of vision models. However, existing MIAs implicitly assume honest query inputs, and their adversarial robustness remains unexplored. We show that MIAs for vision models expose a previously overlooked adversarial surface: adversarial membership manipulation, where imperceptible perturbations can reliably push non-member images into the "member" region of state-of-the-art MIAs. In this paper, we provide the first unified perspective on this phenomenon by analyzing its mechanism and implications. We begin by demonstrating that adversarial membership fabrication is consistently effective across diverse architectures and datasets. We then reveal a distinctive geometric signature - a characteristic gradient-norm collapse trajectory - that reliably separates fabricated from true members despite their nearly identical semantic representations. Building on this insight, we introduce a principled detection strategy grounded in gradient-geometry signals and develop a robust inference framework that substantially mitigates adversarial manipulation. Extensive experiments show that fabrication is broadly effective, while our detection and robust inference strategies significantly enhance resilience. This work establishes the first comprehensive framework for adversarial membership manipulation in vision models.

Paper Structure

This paper contains 35 sections, 1 theorem, 35 equations, 121 figures, 9 tables, 1 algorithm.

Key Result

Theorem 1

Assuming that the $\epsilon$-ball is sufficiently small such that the local curvature of $\ell \circ f$ around $x$ can be well approximated by its second-order Taylor expansion, after taking one step of signed gradient descent with respect to the input sample there exists an $\alpha$ such that the following approximately holds: $\blacktriangleleft$$\blacktriangleleft$

Figures (121)

  • Figure 1: Overview of the Background and Our Proposed Research Problems.
  • Figure 2: Imperceptible Adversarial Perturbations on ImageNet-100. The first row are the original non-members, and the second row are the corresponding perturbed fabricated members. We used $\epsilon = 2/255$ for $\mathcal{B}_\epsilon[x]$ here. The perturbations are extremely imperceptible to the human eye, which demonstrates that the Member Fabrication Attack (MFA) can be successful with only the addition of very small perturbations.
  • Figure 3: Objective of adversarial attacks (left) vs. MFA (right). The black and red dots denote the original input and the perturbed sample within the $\epsilon$-ball (gray region). Adversarial attacks push inputs into the misclassification region (orange), where $\max_{i\neq y} p_i > p_y$. In contrast, MFA drives inputs into high-confidence regions.
  • Figure 4: Visualization of the Distribution of Fabricated and True Members in Different Semantic Feature Spaces Using t-SNE maaten2008visualizing. The two subfigures represent the semantic features at the penultimate and antepenultimate layers, with perturbation constrained to $|\delta|_{\infty} \leq 4/255$. Red and blue dots denote true and fabricated members, respectively. The high degree of overlap of the red dots and blue dots suggests that semantic features alone are insufficient to distinguish between them.
  • Figure 5: Decay of Gradient Norm with Respect to Input Across Steps. As the steps increases, the gradient norm with respect to the input progressively diminishes. For clarity, a large epsilon ball ($\|\delta\|_{\infty} \leq 8/255$) is selected, alongside a small initial step size of $1/24 \times (8/255)$, across 20 steps.
  • ...and 116 more figures

Theorems & Definitions (7)

  • Definition 1: Membership Inference Game
  • Definition 2: The most adversarial example
  • Definition 3: Member Fabrication Attack
  • Definition 4: Member Fabrication Detection
  • Theorem 1: Local Gradient-Norm Decrease Under Small-Step Fabrication
  • Definition 5: Adversarial Membership Inference Game
  • Definition 6: The most adversarial example