A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

Jie Zhu; Jirong Zha; Ding Li; Leye Wang

A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

Jie Zhu, Jirong Zha, Ding Li, Leye Wang

TL;DR

The paper tackles privacy risks in visual self-supervised learning by proposing PartCrop, a unified membership inference method that operates under a black-box setting where the training recipe is unknown. It exploits a shared part-aware capability across SSL paradigms by querying image parts and analyzing their distributional responses to form membership features, learned by a simple attacker. Comprehensive experiments across MAE, DINO, MoCo and multiple datasets demonstrate PartCrop's superior attack performance relative to baselines, including EncoderMI, and show its generalization to additional SSL paradigms; defenses such as early stop, differential privacy, and a novel shrinking crop scale range are evaluated, with SCSR often providing strong privacy gains with acceptable utility costs. Overall, PartCrop offers a practical, cross-paradigm MI framework for SSL models and highlights actionable defense strategies for real-world deployed systems.

Abstract

Self-supervised learning shows promise in harnessing extensive unlabeled data, but it also confronts significant privacy concerns, especially in vision. In this paper, we aim to perform membership inference on visual self-supervised models in a more realistic setting: self-supervised training method and details are unknown for an adversary when attacking as he usually faces a black-box system in practice. In this setting, considering that self-supervised model could be trained by completely different self-supervised paradigms, e.g., masked image modeling and contrastive learning, with complex training details, we propose a unified membership inference method called PartCrop. It is motivated by the shared part-aware capability among models and stronger part response on the training data. Specifically, PartCrop crops parts of objects in an image to query responses with the image in representation space. We conduct extensive attacks on self-supervised models with different training protocols and structures using three widely used image datasets. The results verify the effectiveness and generalization of PartCrop. Moreover, to defend against PartCrop, we evaluate two common approaches, i.e., early stop and differential privacy, and propose a tailored method called shrinking crop scale range. The defense experiments indicate that all of them are effective. Our code is available at https://github.com/JiePKU/PartCrop.

A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

TL;DR

Abstract

Paper Structure (26 sections, 9 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 26 sections, 9 equations, 7 figures, 9 tables, 1 algorithm.

Introduction
Background
Threat Model
Method
Experimental Setting
Self-supervised Model Introduction
Dataset Introduction
Baseline
Implementation Details
Experiment
Results in the Partial Setting
Ablation Study
Merbership Feature
Crop Number
Crop Scale
...and 11 more sections

Figures (7)

Figure 1: DeiT touvron2021training uses supervised learning. MAE he2022masked and CAE chen2022context are masked image modeling based methods. DINO caron2021emerging and MoCo v3 chen2021empirical are contrastive learning based methods. iBOT zhou2021ibot combines the two paradigms. This figure is borrowed from zhu2023understanding. We refer readers of interest to zhu2023understanding.
Figure 2: Part response visualization on MAE (masked), DINO (contrastive), and MoCo (contrastive). Images are from Tinyimagenet tinyimagenet_le2015tiny. (a), (b), (c) Similarity curves of the chair image and chair seat part on MAE, DINO, and MoCo, respectively. (d), (e), (f) Similarity curves of the dog image and dog muzzle part on MAE, DINO, and MoCo, respectively.
Figure 3: An overview of PartCrop.
Figure 4: Ablation study on crop number. We consider four different crop number i.e., 32, 64, 128, and 256.
Figure 5: Ablation study on crop scale. We consider five different crop scales i.e., $(0.08,\; 0.1)$, $(0.08,\; 0.2)$, $(0.08,\; 0.3)$, $(0.01,\; 0.03)$, and $(0.5,\; 1.0)$.
...and 2 more figures

A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

TL;DR

Abstract

A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

Authors

TL;DR

Abstract

Table of Contents

Figures (7)