Table of Contents
Fetching ...

Membership Inference Attacks for Face Images Against Fine-Tuned Latent Diffusion Models

Lauritz Christian Holme, Anton Mosquera Storgaard, Siavash Arjomand Bigdeli

TL;DR

The paper addresses privacy leakage through Membership Inference Attacks on Latent Diffusion Models that are finely tuned on face images. It proposes a black-box MIA where a supervised attacker $\mathbf{M_A}$ learns from positives generated by the target $\mathbf{M_T}$ and negatives from an auxiliary dataset, with performance boosted by generated negatives and visible watermarks; attack efficacy also depends on the guidance scale $s$. The results show the MIA is viable for dataset-level membership in realistic settings, with higher AUC when using generated auxiliary data and visible watermarks, while single-image membership remains challenging. The work highlights practical privacy risks for face-image fine-tuning of Latent Diffusion Models and informs mitigation strategies such as watermarking and careful data provenance, albeit at substantial computational cost and with domain-specific limits.

Abstract

The rise of generative image models leads to privacy concerns when it comes to the huge datasets used to train such models. This paper investigates the possibility of inferring if a set of face images was used for fine-tuning a Latent Diffusion Model (LDM). A Membership Inference Attack (MIA) method is presented for this task. Using generated auxiliary data for the training of the attack model leads to significantly better performance, and so does the use of watermarks. The guidance scale used for inference was found to have a significant influence. If a LDM is fine-tuned for long enough, the text prompt used for inference has no significant influence. The proposed MIA is found to be viable in a realistic black-box setup against LDMs fine-tuned on face-images.

Membership Inference Attacks for Face Images Against Fine-Tuned Latent Diffusion Models

TL;DR

The paper addresses privacy leakage through Membership Inference Attacks on Latent Diffusion Models that are finely tuned on face images. It proposes a black-box MIA where a supervised attacker learns from positives generated by the target and negatives from an auxiliary dataset, with performance boosted by generated negatives and visible watermarks; attack efficacy also depends on the guidance scale . The results show the MIA is viable for dataset-level membership in realistic settings, with higher AUC when using generated auxiliary data and visible watermarks, while single-image membership remains challenging. The work highlights practical privacy risks for face-image fine-tuning of Latent Diffusion Models and informs mitigation strategies such as watermarking and careful data provenance, albeit at substantial computational cost and with domain-specific limits.

Abstract

The rise of generative image models leads to privacy concerns when it comes to the huge datasets used to train such models. This paper investigates the possibility of inferring if a set of face images was used for fine-tuning a Latent Diffusion Model (LDM). A Membership Inference Attack (MIA) method is presented for this task. Using generated auxiliary data for the training of the attack model leads to significantly better performance, and so does the use of watermarks. The guidance scale used for inference was found to have a significant influence. If a LDM is fine-tuned for long enough, the text prompt used for inference has no significant influence. The proposed MIA is found to be viable in a realistic black-box setup against LDMs fine-tuned on face-images.

Paper Structure

This paper contains 26 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: An approach a malicious actor could use to obtain a LDM which is trained on unrightfully obtained data.
  • Figure 2: The approach taken to train the Attack Model $\mathbf{M_{A}}$. Images ($\mathbf{D_{Target}}$) are used to train $\mathbf{M_T}$ which then outputs some images which are used as positives in $\mathbf{D_{Train}}$. The negatives in $\mathbf{D_{Train}}$ come from an auxiliary dataset. A model is then trained on $\mathbf{D_{Train}}$ using supervised learning to produce the final attack model $\mathbf{M_A}.$
  • Figure 3: These are examples of the data in the different generated image datasets which are used to train the attack model.
  • Figure 4: A graphic representation of how an attack model $\mathbf{M_A}$ could be built and later queried with an image $\mathbf{Q}$.
  • Figure 5: $\mathbf{M_{A}}$ is given a set $\mathbf{Q}$ of images combined from $\mathbf{D_{Target}}$ and an auxiliary image data set. This allows for evaluating the performance of $\mathbf{M_{A}}$
  • ...and 3 more figures