Membership Inference Attacks for Face Images Against Fine-Tuned Latent Diffusion Models
Lauritz Christian Holme, Anton Mosquera Storgaard, Siavash Arjomand Bigdeli
TL;DR
The paper addresses privacy leakage through Membership Inference Attacks on Latent Diffusion Models that are finely tuned on face images. It proposes a black-box MIA where a supervised attacker $\mathbf{M_A}$ learns from positives generated by the target $\mathbf{M_T}$ and negatives from an auxiliary dataset, with performance boosted by generated negatives and visible watermarks; attack efficacy also depends on the guidance scale $s$. The results show the MIA is viable for dataset-level membership in realistic settings, with higher AUC when using generated auxiliary data and visible watermarks, while single-image membership remains challenging. The work highlights practical privacy risks for face-image fine-tuning of Latent Diffusion Models and informs mitigation strategies such as watermarking and careful data provenance, albeit at substantial computational cost and with domain-specific limits.
Abstract
The rise of generative image models leads to privacy concerns when it comes to the huge datasets used to train such models. This paper investigates the possibility of inferring if a set of face images was used for fine-tuning a Latent Diffusion Model (LDM). A Membership Inference Attack (MIA) method is presented for this task. Using generated auxiliary data for the training of the attack model leads to significantly better performance, and so does the use of watermarks. The guidance scale used for inference was found to have a significant influence. If a LDM is fine-tuned for long enough, the text prompt used for inference has no significant influence. The proposed MIA is found to be viable in a realistic black-box setup against LDMs fine-tuned on face-images.
