Table of Contents
Fetching ...

Membership Inference Attack Against Masked Image Modeling

Zheng Li, Xinlei He, Ning Yu, Yang Zhang

TL;DR

This paper addresses the privacy risks of pre-training data in Masked Image Modeling (MIM) by introducing the first membership inference attack tailored to MIM. The core idea is to simulate MIM's masking and reconstruction process and use the distance between reconstructed and original images as a membership signal, with a threshold learned from a shadow dataset. Through extensive experiments on ViT-based encoders and multiple datasets, the attack surpasses existing baselines and reveals how factors like model complexity, mask ratio, and pre-training epochs affect leakage. The work also explores adversary knowledge relaxations and defenses, highlighting practical implications for safeguarding MIM pre-training data in real-world deployments.

Abstract

Masked Image Modeling (MIM) has achieved significant success in the realm of self-supervised learning (SSL) for visual recognition. The image encoder pre-trained through MIM, involving the masking and subsequent reconstruction of input images, attains state-of-the-art performance in various downstream vision tasks. However, most existing works focus on improving the performance of MIM.In this work, we take a different angle by studying the pre-training data privacy of MIM. Specifically, we propose the first membership inference attack against image encoders pre-trained by MIM, which aims to determine whether an image is part of the MIM pre-training dataset. The key design is to simulate the pre-training paradigm of MIM, i.e., image masking and subsequent reconstruction, and then obtain reconstruction errors. These reconstruction errors can serve as membership signals for achieving attack goals, as the encoder is more capable of reconstructing the input image in its training set with lower errors. Extensive evaluations are conducted on three model architectures and three benchmark datasets. Empirical results show that our attack outperforms baseline methods. Additionally, we undertake intricate ablation studies to analyze multiple factors that could influence the performance of the attack.

Membership Inference Attack Against Masked Image Modeling

TL;DR

This paper addresses the privacy risks of pre-training data in Masked Image Modeling (MIM) by introducing the first membership inference attack tailored to MIM. The core idea is to simulate MIM's masking and reconstruction process and use the distance between reconstructed and original images as a membership signal, with a threshold learned from a shadow dataset. Through extensive experiments on ViT-based encoders and multiple datasets, the attack surpasses existing baselines and reveals how factors like model complexity, mask ratio, and pre-training epochs affect leakage. The work also explores adversary knowledge relaxations and defenses, highlighting practical implications for safeguarding MIM pre-training data in real-world deployments.

Abstract

Masked Image Modeling (MIM) has achieved significant success in the realm of self-supervised learning (SSL) for visual recognition. The image encoder pre-trained through MIM, involving the masking and subsequent reconstruction of input images, attains state-of-the-art performance in various downstream vision tasks. However, most existing works focus on improving the performance of MIM.In this work, we take a different angle by studying the pre-training data privacy of MIM. Specifically, we propose the first membership inference attack against image encoders pre-trained by MIM, which aims to determine whether an image is part of the MIM pre-training dataset. The key design is to simulate the pre-training paradigm of MIM, i.e., image masking and subsequent reconstruction, and then obtain reconstruction errors. These reconstruction errors can serve as membership signals for achieving attack goals, as the encoder is more capable of reconstructing the input image in its training set with lower errors. Extensive evaluations are conducted on three model architectures and three benchmark datasets. Empirical results show that our attack outperforms baseline methods. Additionally, we undertake intricate ablation studies to analyze multiple factors that could influence the performance of the attack.
Paper Structure (24 sections, 8 figures, 7 tables)

This paper contains 24 sections, 8 figures, 7 tables.

Figures (8)

  • Figure 1: An illustration of the pre-training stage and downstream stage of Masked Image Modeling (MIM).
  • Figure 2: Overview of our attack mechanism against Masked Image Modeling (MIM).
  • Figure 3: The distribution of distance between the reconstructed images and original images for members and non-members.
  • Figure 4: Attack performance under the relaxation of the assumption that the shadow dataset shares the same distribution as the target dataset.
  • Figure 5: Attack performance under the relaxation of the assumption that the shadow encoder shares the same mask ratio as the target encoder. The pre-training dataset is TinyImageNet.
  • ...and 3 more figures