Membership Inference Attacks Against Text-to-image Generation Models

Yixin Wu; Ning Yu; Zheng Li; Michael Backes; Yang Zhang

Membership Inference Attacks Against Text-to-image Generation Models

Yixin Wu, Ning Yu, Zheng Li, Michael Backes, Yang Zhang

TL;DR

This work pioneers privacy analysis for text-to-image generation by formulating membership inference in a black-box setting and proposing four attacks based on three key intuitions about overfitting. By evaluating on diffusion-based (LDM) and sequence-to-sequence (DALL-E mini) models, it demonstrates strong leakage, with semantic-level attacks achieving near-perfect accuracy and robust performance across ablations. The findings reveal that members produce images more faithful to query semantics and that privacy risks are substantial even with limited auxiliary data, highlighting the need for defenses. Overall, the study provides a foundation for understanding and mitigating membership privacy risks in text-conditioned image synthesis and informs developers and researchers about vulnerable factors to monitor.

Abstract

Text-to-image generation models have recently attracted unprecedented attention as they unlatch imaginative applications in all areas of life. However, developing such models requires huge amounts of data that might contain privacy-sensitive information, e.g., face identity. While privacy risks have been extensively demonstrated in the image classification and GAN generation domains, privacy risks in the text-to-image generation domain are largely unexplored. In this paper, we perform the first privacy analysis of text-to-image generation models through the lens of membership inference. Specifically, we propose three key intuitions about membership information and design four attack methodologies accordingly. We conduct comprehensive evaluations on two mainstream text-to-image generation models including sequence-to-sequence modeling and diffusion-based modeling. The empirical results show that all of the proposed attacks can achieve significant performance, in some cases even close to an accuracy of 1, and thus the corresponding risk is much more severe than that shown by existing membership inference attacks. We further conduct an extensive ablation study to analyze the factors that may affect the attack performance, which can guide developers and researchers to be alert to vulnerabilities in text-to-image generation models. All these findings indicate that our proposed attacks pose a realistic privacy threat to the text-to-image generation models.

Membership Inference Attacks Against Text-to-image Generation Models

TL;DR

Abstract

Paper Structure (16 sections, 9 figures, 3 tables)

This paper contains 16 sections, 9 figures, 3 tables.

Introduction
Background
Text-to-image Generation Models
Membership Inference Attacks
Problem Statement
Text-to-image Generation
Threat Model
Methodology
Intuitions
Attack Methodologies
Evaluation
Experimental Setup
Results
Defense
Conclusion
...and 1 more sections

Figures (9)

Figure 1: Overview of our attack pipeline.
Figure 2: Test accuracy of the proposed attack methods on the (a) LDM model and (b) DALL-E mini. The caption and embedding generation tools for both cases are BLIP.
Figure 3: Test accuracy of the proposed attack methods with varying denoising steps on the LDM model. The non-member dataset of (a) is MSCOCO-Face and of (b) is VG-Face. Dashed lines represent pixel-level attacks and solid lines represent semantic-level attacks.
Figure 4: Test accuracy of the proposed attack methods on the LDM model. The caption and embedding generation tools are ClipCap/CLIP.
Figure 5: FID scores' difference between member and non-member datasets.
...and 4 more figures

Membership Inference Attacks Against Text-to-image Generation Models

TL;DR

Abstract

Membership Inference Attacks Against Text-to-image Generation Models

Authors

TL;DR

Abstract

Table of Contents

Figures (9)