Membership Inference Attacks Against Text-to-image Generation Models
Yixin Wu, Ning Yu, Zheng Li, Michael Backes, Yang Zhang
TL;DR
This work pioneers privacy analysis for text-to-image generation by formulating membership inference in a black-box setting and proposing four attacks based on three key intuitions about overfitting. By evaluating on diffusion-based (LDM) and sequence-to-sequence (DALL-E mini) models, it demonstrates strong leakage, with semantic-level attacks achieving near-perfect accuracy and robust performance across ablations. The findings reveal that members produce images more faithful to query semantics and that privacy risks are substantial even with limited auxiliary data, highlighting the need for defenses. Overall, the study provides a foundation for understanding and mitigating membership privacy risks in text-conditioned image synthesis and informs developers and researchers about vulnerable factors to monitor.
Abstract
Text-to-image generation models have recently attracted unprecedented attention as they unlatch imaginative applications in all areas of life. However, developing such models requires huge amounts of data that might contain privacy-sensitive information, e.g., face identity. While privacy risks have been extensively demonstrated in the image classification and GAN generation domains, privacy risks in the text-to-image generation domain are largely unexplored. In this paper, we perform the first privacy analysis of text-to-image generation models through the lens of membership inference. Specifically, we propose three key intuitions about membership information and design four attack methodologies accordingly. We conduct comprehensive evaluations on two mainstream text-to-image generation models including sequence-to-sequence modeling and diffusion-based modeling. The empirical results show that all of the proposed attacks can achieve significant performance, in some cases even close to an accuracy of 1, and thus the corresponding risk is much more severe than that shown by existing membership inference attacks. We further conduct an extensive ablation study to analyze the factors that may affect the attack performance, which can guide developers and researchers to be alert to vulnerabilities in text-to-image generation models. All these findings indicate that our proposed attacks pose a realistic privacy threat to the text-to-image generation models.
