Table of Contents
Fetching ...

Risks When Sharing LoRA Fine-Tuned Diffusion Model Weights

Dixi Yao

TL;DR

This paper designs and builds a variational network autoencoder that takes model weights as input and outputs the reconstruction of private images, and proposes a training paradigm with the help of timestep embedding that gives a surprising answer to this research question.

Abstract

With the emerging trend in generative models and convenient public access to diffusion models pre-trained on large datasets, users can fine-tune these models to generate images of personal faces or items in new contexts described by natural language. Parameter efficient fine-tuning (PEFT) such as Low Rank Adaptation (LoRA) has become the most common way to save memory and computation usage on the user end during fine-tuning. However, a natural question is whether the private images used for fine-tuning will be leaked to adversaries when sharing model weights. In this paper, we study the issue of privacy leakage of a fine-tuned diffusion model in a practical setting, where adversaries only have access to model weights, rather than prompts or images used for fine-tuning. We design and build a variational network autoencoder that takes model weights as input and outputs the reconstruction of private images. To improve the efficiency of training such an autoencoder, we propose a training paradigm with the help of timestep embedding. The results give a surprising answer to this research question: an adversary can generate images containing the same identities as the private images. Furthermore, we demonstrate that no existing defense method, including differential privacy-based methods, can preserve the privacy of private data used for fine-tuning a diffusion model without compromising the utility of a fine-tuned model.

Risks When Sharing LoRA Fine-Tuned Diffusion Model Weights

TL;DR

This paper designs and builds a variational network autoencoder that takes model weights as input and outputs the reconstruction of private images, and proposes a training paradigm with the help of timestep embedding that gives a surprising answer to this research question.

Abstract

With the emerging trend in generative models and convenient public access to diffusion models pre-trained on large datasets, users can fine-tune these models to generate images of personal faces or items in new contexts described by natural language. Parameter efficient fine-tuning (PEFT) such as Low Rank Adaptation (LoRA) has become the most common way to save memory and computation usage on the user end during fine-tuning. However, a natural question is whether the private images used for fine-tuning will be leaked to adversaries when sharing model weights. In this paper, we study the issue of privacy leakage of a fine-tuned diffusion model in a practical setting, where adversaries only have access to model weights, rather than prompts or images used for fine-tuning. We design and build a variational network autoencoder that takes model weights as input and outputs the reconstruction of private images. To improve the efficiency of training such an autoencoder, we propose a training paradigm with the help of timestep embedding. The results give a surprising answer to this research question: an adversary can generate images containing the same identities as the private images. Furthermore, we demonstrate that no existing defense method, including differential privacy-based methods, can preserve the privacy of private data used for fine-tuning a diffusion model without compromising the utility of a fine-tuned model.
Paper Structure (46 sections, 1 theorem, 2 equations, 14 figures, 5 tables, 3 algorithms)

This paper contains 46 sections, 1 theorem, 2 equations, 14 figures, 5 tables, 3 algorithms.

Key Result

Theorem 6.1

A randomized mechanism $\mathcal{M}$ mapping data from $\mathcal{X}$ on to $\mathcal{Y}$ is $(\epsilon,\delta)$ differential private if for any two datasets $x$ and $x'\in \mathcal{X}$ differ by at most one entry, and for any output sets $y\in\mathcal{Y}$, it holds that

Figures (14)

  • Figure 1: The private images used for fine-tuning the stable diffusion V-1.4 over different types of images.
  • Figure 2: The overall framework of fine-tuning a diffusion model and our attack method
  • Figure 3: The structure of neural network encoder
  • Figure 4: The structure of matrix encoder where corresponding LoRA matrix is in dimension of $W\times H$. The kernel size of convolution layer is 3.
  • Figure 5: The generated image by stable diffusion V-1.4 fine-tuned with images of Elon Musk and reconstruction by different attack methods.
  • ...and 9 more figures

Theorems & Definitions (1)

  • Theorem 6.1