Table of Contents
Fetching ...

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models

Jun Jia, Hongyi Miao, Yingjie Zhou, Linhan Cao, Yanwei Jiang, Wangqiu Zhou, Dandan Zhu, Hua Yang, Wei Sun, Xiongkuo Min, Guangtao Zhai

TL;DR

This work addresses the privacy risks posed by diffusion-model customization, where unauthorized users can customize models via fine-tuning or zero-shot methods using only a few images. It proposes DLADiff, a dual-layer defense that combines a first-layer protection based on Dual-Surrogate Models (DSUR) and Alternating Dynamic Fine-Tuning (ADFT) to thwart fine-tuning, with a second-layer perturbation that leverages multiple identity encoders and PGD to resist zero-shot generation. Empirical results show DLADiff outperforms state-of-the-art defenses in both defense scenarios, achieving lower quality outputs for unauthorized models and significantly reducing identity leakage across diverse encoders and adapters. The framework offers a practical, generalizable approach to protecting personal portraits in diffusion-model applications, with potential impact on privacy-preserving image synthesis and responsible deployment of generative AI.

Abstract

With the rapid advancement of diffusion models, a variety of fine-tuning methods have been developed, enabling high-fidelity image generation with high similarity to the target content using only 3 to 5 training images. More recently, zero-shot generation methods have emerged, capable of producing highly realistic outputs from a single reference image without altering model weights. However, technological advancements have also introduced significant risks to facial privacy. Malicious actors can exploit diffusion model customization with just a few or even one image of a person to create synthetic identities nearly identical to the original identity. Although research has begun to focus on defending against diffusion model customization, most existing defense methods target fine-tuning approaches and neglect zero-shot generation defenses. To address this issue, this paper proposes Dual-Layer Anti-Diffusion (DLADiff) to defense both fine-tuning methods and zero-shot methods. DLADiff contains a dual-layer protective mechanism. The first layer provides effective protection against unauthorized fine-tuning by leveraging the proposed Dual-Surrogate Models (DSUR) mechanism and Alternating Dynamic Fine-Tuning (ADFT), which integrates adversarial training with the prior knowledge derived from pre-fine-tuned models. The second layer, though simple in design, demonstrates strong effectiveness in preventing image generation through zero-shot methods. Extensive experimental results demonstrate that our method significantly outperforms existing approaches in defending against fine-tuning of diffusion models and achieves unprecedented performance in protecting against zero-shot generation.

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models

TL;DR

This work addresses the privacy risks posed by diffusion-model customization, where unauthorized users can customize models via fine-tuning or zero-shot methods using only a few images. It proposes DLADiff, a dual-layer defense that combines a first-layer protection based on Dual-Surrogate Models (DSUR) and Alternating Dynamic Fine-Tuning (ADFT) to thwart fine-tuning, with a second-layer perturbation that leverages multiple identity encoders and PGD to resist zero-shot generation. Empirical results show DLADiff outperforms state-of-the-art defenses in both defense scenarios, achieving lower quality outputs for unauthorized models and significantly reducing identity leakage across diverse encoders and adapters. The framework offers a practical, generalizable approach to protecting personal portraits in diffusion-model applications, with potential impact on privacy-preserving image synthesis and responsible deployment of generative AI.

Abstract

With the rapid advancement of diffusion models, a variety of fine-tuning methods have been developed, enabling high-fidelity image generation with high similarity to the target content using only 3 to 5 training images. More recently, zero-shot generation methods have emerged, capable of producing highly realistic outputs from a single reference image without altering model weights. However, technological advancements have also introduced significant risks to facial privacy. Malicious actors can exploit diffusion model customization with just a few or even one image of a person to create synthetic identities nearly identical to the original identity. Although research has begun to focus on defending against diffusion model customization, most existing defense methods target fine-tuning approaches and neglect zero-shot generation defenses. To address this issue, this paper proposes Dual-Layer Anti-Diffusion (DLADiff) to defense both fine-tuning methods and zero-shot methods. DLADiff contains a dual-layer protective mechanism. The first layer provides effective protection against unauthorized fine-tuning by leveraging the proposed Dual-Surrogate Models (DSUR) mechanism and Alternating Dynamic Fine-Tuning (ADFT), which integrates adversarial training with the prior knowledge derived from pre-fine-tuned models. The second layer, though simple in design, demonstrates strong effectiveness in preventing image generation through zero-shot methods. Extensive experimental results demonstrate that our method significantly outperforms existing approaches in defending against fine-tuning of diffusion models and achieves unprecedented performance in protecting against zero-shot generation.

Paper Structure

This paper contains 24 sections, 11 equations, 11 figures, 10 tables, 2 algorithms.

Figures (11)

  • Figure 1: The DLADiff framework protects personal photos by simultaneously resisting fine-tuning and zero-shot generation in diffusion models, significantly degrading the output quality of maliciously customized models. Some of the visualization results in this paper may cause discomfort to viewers.
  • Figure 2: The optimization process of the first layer of protective perturbation in DLADiff. This layer can effectively defense fine-tuning based diffusion model customization. The optimization process includes four steps. Step-0 involves the pre-fine-tuning of a static surrogate model, denoted as $\mathbf{UNet_s}$, using a clean dataset that shares the same identity as the images to be protected. Step-1 optimizes perturbations $\delta_{ft}$ by disrupting the attention maps using $\mathbf{UNet_s}$ as reference. Step-2 and Step-3 involve the optimization based on adversarial training. Repeat Step-1 to Step-3 until the preset epoch is reached.
  • Figure 3: The comparison results on defending DreamBooth and LoRA Fine-tuning. We use the protected images to fine-tune a pretrained stale diffusion model. Then, we generate images under diverse random seeds using the fine-tuned weights.
  • Figure 4: The comparison results on defending Zero-shot generation methods.
  • Figure 5: The user interface of subjective assessment experiments.
  • ...and 6 more figures