Table of Contents
Fetching ...

PEPPER: Perception-Guided Perturbation for Robust Backdoor Defense in Text-to-Image Diffusion Models

Oscar Chew, Po-Yi Lu, Jayden Lin, Kuan-Hao Huang, Hsuan-Tien Lin

TL;DR

PEPPER addresses backdoor vulnerabilities in text-to-image diffusion models by applying perception-guided textual perturbations to rewritten captions, extending prompts with unobtrusive details to disrupt embedded triggers while preserving visual fidelity. The method, driven by GPT-4, produces semantically distant yet visually similar captions, proving especially effective against text-encoder–based attacks and able to function as a plug-and-play module with existing defenses. Across short and long prompts, PEPPER substantially lowers attack success rates with minimal impact on output quality, and its effectiveness improves when combined with T2IShield or UFID. This work advances robust diffusion-model deployment by providing a general, adaptable defense that enhances security across diverse backdoor attack families.

Abstract

Recent studies show that text to image (T2I) diffusion models are vulnerable to backdoor attacks, where a trigger in the input prompt can steer generation toward harmful or unintended content. To address this, we introduce PEPPER (PErcePtion Guided PERturbation), a backdoor defense that rewrites the caption into a semantically distant yet visually similar caption while adding unobstructive elements. With this rewriting strategy, PEPPER disrupt the trigger embedded in the input prompt, dilute the influence of trigger tokens and thereby achieve enhanced robustness. Experiments show that PEPPER is particularly effective against text encoder based attacks, substantially reducing attack success while preserving generation quality. Beyond this, PEPPER can be paired with any existing defenses yielding consistently stronger and generalizable robustness than any standalone method. Our code will be released on Github.

PEPPER: Perception-Guided Perturbation for Robust Backdoor Defense in Text-to-Image Diffusion Models

TL;DR

PEPPER addresses backdoor vulnerabilities in text-to-image diffusion models by applying perception-guided textual perturbations to rewritten captions, extending prompts with unobtrusive details to disrupt embedded triggers while preserving visual fidelity. The method, driven by GPT-4, produces semantically distant yet visually similar captions, proving especially effective against text-encoder–based attacks and able to function as a plug-and-play module with existing defenses. Across short and long prompts, PEPPER substantially lowers attack success rates with minimal impact on output quality, and its effectiveness improves when combined with T2IShield or UFID. This work advances robust diffusion-model deployment by providing a general, adaptable defense that enhances security across diverse backdoor attack families.

Abstract

Recent studies show that text to image (T2I) diffusion models are vulnerable to backdoor attacks, where a trigger in the input prompt can steer generation toward harmful or unintended content. To address this, we introduce PEPPER (PErcePtion Guided PERturbation), a backdoor defense that rewrites the caption into a semantically distant yet visually similar caption while adding unobstructive elements. With this rewriting strategy, PEPPER disrupt the trigger embedded in the input prompt, dilute the influence of trigger tokens and thereby achieve enhanced robustness. Experiments show that PEPPER is particularly effective against text encoder based attacks, substantially reducing attack success while preserving generation quality. Beyond this, PEPPER can be paired with any existing defenses yielding consistently stronger and generalizable robustness than any standalone method. Our code will be released on Github.

Paper Structure

This paper contains 21 sections, 3 figures, 9 tables.

Figures (3)

  • Figure 1: After synonym replacement, the generated image still contains the attack target, highlighting its limitation against more aggressive attacks.
  • Figure 2: The target zebra appears in the generation under short-prompt setting but not under-long prompt.
  • Figure 3: PEPPER moves outside the attacked region and recovers the intended image by rewriting the prompt to a semantically shifted yet visually similar phrase