Table of Contents
Fetching ...

PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models

Xinwei Liu, Xiaojun Jia, Yuan Xun, Hua Zhang, Xiaochun Cao

TL;DR

PersGuard addresses privacy and copyright risks in personalized text-to-image generation by deploying backdoors at upstream pre-trained diffusion models. It formulates a unified optimization with three losses—Backdoor Behavior Loss, Prior Preservation Loss, and Backdoor Retention Loss—to implant protected-output backdoors (Pattern, Erasure, Target) that survive downstream personalization while remaining inactive for unprotected data. Extensive experiments across white/gray-box settings, multiple protected categories, and facial identities show PersGuard can reliably trigger protective outputs for protected images with minimal impact on general generation and outperforms Anti-DreamBooth. The work demonstrates a practical, model-level approach to safeguarding privacy and rights in diffusion-based personalization, with insights into robustness under transformations and gray-box scenarios.

Abstract

Diffusion models (DMs) have revolutionized data generation, particularly in text-to-image (T2I) synthesis. However, the widespread use of personalized generative models raises significant concerns regarding privacy violations and copyright infringement. To address these issues, researchers have proposed adversarial perturbation-based protection techniques. However, these methods have notable limitations, including insufficient robustness against data transformations and the inability to fully eliminate identifiable features of protected objects in the generated output. In this paper, we introduce PersGuard, a novel backdoor-based approach that prevents malicious personalization of specific images. Unlike traditional adversarial perturbation methods, PersGuard implant backdoor triggers into pre-trained T2I models, preventing the generation of customized outputs for designated protected images while allowing normal personalization for unprotected ones. Unfortunately, existing backdoor methods for T2I diffusion models fail to be applied to personalization scenarios due to the different backdoor objectives and the potential backdoor elimination during downstream fine-tuning processes. To address these, we propose three novel backdoor objectives specifically designed for personalization scenarios, coupled with backdoor retention loss engineered to resist downstream fine-tuning. These components are integrated into a unified optimization framework. Extensive experimental evaluations demonstrate PersGuard's effectiveness in preserving data privacy, even under challenging conditions including gray-box settings, multi-object protection, and facial identity scenarios. Our method significantly outperforms existing techniques, offering a more robust solution for privacy and copyright protection.

PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models

TL;DR

PersGuard addresses privacy and copyright risks in personalized text-to-image generation by deploying backdoors at upstream pre-trained diffusion models. It formulates a unified optimization with three losses—Backdoor Behavior Loss, Prior Preservation Loss, and Backdoor Retention Loss—to implant protected-output backdoors (Pattern, Erasure, Target) that survive downstream personalization while remaining inactive for unprotected data. Extensive experiments across white/gray-box settings, multiple protected categories, and facial identities show PersGuard can reliably trigger protective outputs for protected images with minimal impact on general generation and outperforms Anti-DreamBooth. The work demonstrates a practical, model-level approach to safeguarding privacy and rights in diffusion-based personalization, with insights into robustness under transformations and gray-box scenarios.

Abstract

Diffusion models (DMs) have revolutionized data generation, particularly in text-to-image (T2I) synthesis. However, the widespread use of personalized generative models raises significant concerns regarding privacy violations and copyright infringement. To address these issues, researchers have proposed adversarial perturbation-based protection techniques. However, these methods have notable limitations, including insufficient robustness against data transformations and the inability to fully eliminate identifiable features of protected objects in the generated output. In this paper, we introduce PersGuard, a novel backdoor-based approach that prevents malicious personalization of specific images. Unlike traditional adversarial perturbation methods, PersGuard implant backdoor triggers into pre-trained T2I models, preventing the generation of customized outputs for designated protected images while allowing normal personalization for unprotected ones. Unfortunately, existing backdoor methods for T2I diffusion models fail to be applied to personalization scenarios due to the different backdoor objectives and the potential backdoor elimination during downstream fine-tuning processes. To address these, we propose three novel backdoor objectives specifically designed for personalization scenarios, coupled with backdoor retention loss engineered to resist downstream fine-tuning. These components are integrated into a unified optimization framework. Extensive experimental evaluations demonstrate PersGuard's effectiveness in preserving data privacy, even under challenging conditions including gray-box settings, multi-object protection, and facial identity scenarios. Our method significantly outperforms existing techniques, offering a more robust solution for privacy and copyright protection.

Paper Structure

This paper contains 25 sections, 11 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Comparison of different protection methods against unauthorized model personalization: (a) unprotected model personalization process, (b) protection through adversarial perturbations that disrupt training outputs, and (c) our proposed PersGuard using backdoor to generate protective outputs while maintaining normal results for unprotected images.
  • Figure 2: Overview of PersGuard, consisting of Pattern-Backdoor, Erasure-Backdoor and Target-Backdoor.
  • Figure 3: Comparison of Anti-DB and PersGuard evaluated by LLMs and CLIP-I. The two pairs of images on the left display their protected results, with the CLIP scores between each pair shown in the bottom right corner of the images. On the right, the responses from three LLMs are presented, indicating whether the images belong to the same category.
  • Figure 4: Loss curves comparison between clean model and backdoored models during fine-tuning. The shaded regions represent the variance of loss values.
  • Figure 5: CLIP Score curves during personalization fine-tuning for different backdoor types.
  • ...and 1 more figures