Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models
Qiao Li, Xiaomeng Fu, Xi Wang, Jin Liu, Xingyu Gao, Jiao Dai, Jizhong Han
TL;DR
This work investigates privacy risks in text-to-image diffusion models by showing that memorization in such models is more pronounced at the structure level than at pixel level. It introduces a structure-based Membership Inference Attack that leverages the early diffusion stages where member images retain more structural information, using DDIM inversion to compare original and reconstructed images via a structural similarity metric. The method, which can also rely on prompt-derived text, achieves state-of-the-art AUC and attack success across Latent Diffusion and Stable Diffusion, and demonstrates robustness to distortions and variations in textual prompts. The results highlight a practical privacy concern for training data and offer a robust, forward-diffusion-based tool for detecting training membership with real-world applicability.
Abstract
With the rapid advancements of large-scale text-to-image diffusion models, various practical applications have emerged, bringing significant convenience to society. However, model developers may misuse the unauthorized data to train diffusion models. These data are at risk of being memorized by the models, thus potentially violating citizens' privacy rights. Therefore, in order to judge whether a specific image is utilized as a member of a model's training set, Membership Inference Attack (MIA) is proposed to serve as a tool for privacy protection. Current MIA methods predominantly utilize pixel-wise comparisons as distinguishing clues, considering the pixel-level memorization characteristic of diffusion models. However, it is practically impossible for text-to-image models to memorize all the pixel-level information in massive training sets. Therefore, we move to the more advanced structure-level memorization. Observations on the diffusion process show that the structures of members are better preserved compared to those of nonmembers, indicating that diffusion models possess the capability to remember the structures of member images from training sets. Drawing on these insights, we propose a simple yet effective MIA method tailored for text-to-image diffusion models. Extensive experimental results validate the efficacy of our approach. Compared to current pixel-level baselines, our approach not only achieves state-of-the-art performance but also demonstrates remarkable robustness against various distortions.
