Table of Contents
Fetching ...

Understanding Implosion in Text-to-Image Generative Models

Wenxin Ding, Cathy Y. Li, Shawn Shan, Ben Y. Zhao, Haitao Zheng

TL;DR

This work establishes the first analytical framework on robustness of image generative models to poisoning attacks, by modeling and analyzing the behavior of the cross-attention mechanism in latent diffusion models, and formally quantify the impact of training data by the hardness of alignment, measured by an Alignment Difficulty (AD) metric.

Abstract

Recent works show that text-to-image generative models are surprisingly vulnerable to a variety of poisoning attacks. Empirical results find that these models can be corrupted by altering associations between individual text prompts and associated visual features. Furthermore, a number of concurrent poisoning attacks can induce "model implosion," where the model becomes unable to produce meaningful images for unpoisoned prompts. These intriguing findings highlight the absence of an intuitive framework to understand poisoning attacks on these models. In this work, we establish the first analytical framework on robustness of image generative models to poisoning attacks, by modeling and analyzing the behavior of the cross-attention mechanism in latent diffusion models. We model cross-attention training as an abstract problem of "supervised graph alignment" and formally quantify the impact of training data by the hardness of alignment, measured by an Alignment Difficulty (AD) metric. The higher the AD, the harder the alignment. We prove that AD increases with the number of individual prompts (or concepts) poisoned. As AD grows, the alignment task becomes increasingly difficult, yielding highly distorted outcomes that frequently map meaningful text prompts to undefined or meaningless visual representations. As a result, the generative model implodes and outputs random, incoherent images at large. We validate our analytical framework through extensive experiments, and we confirm and explain the unexpected (and unexplained) effect of model implosion while producing new, unforeseen insights. Our work provides a useful tool for studying poisoning attacks against diffusion models and their defenses.

Understanding Implosion in Text-to-Image Generative Models

TL;DR

This work establishes the first analytical framework on robustness of image generative models to poisoning attacks, by modeling and analyzing the behavior of the cross-attention mechanism in latent diffusion models, and formally quantify the impact of training data by the hardness of alignment, measured by an Alignment Difficulty (AD) metric.

Abstract

Recent works show that text-to-image generative models are surprisingly vulnerable to a variety of poisoning attacks. Empirical results find that these models can be corrupted by altering associations between individual text prompts and associated visual features. Furthermore, a number of concurrent poisoning attacks can induce "model implosion," where the model becomes unable to produce meaningful images for unpoisoned prompts. These intriguing findings highlight the absence of an intuitive framework to understand poisoning attacks on these models. In this work, we establish the first analytical framework on robustness of image generative models to poisoning attacks, by modeling and analyzing the behavior of the cross-attention mechanism in latent diffusion models. We model cross-attention training as an abstract problem of "supervised graph alignment" and formally quantify the impact of training data by the hardness of alignment, measured by an Alignment Difficulty (AD) metric. The higher the AD, the harder the alignment. We prove that AD increases with the number of individual prompts (or concepts) poisoned. As AD grows, the alignment task becomes increasingly difficult, yielding highly distorted outcomes that frequently map meaningful text prompts to undefined or meaningless visual representations. As a result, the generative model implodes and outputs random, incoherent images at large. We validate our analytical framework through extensive experiments, and we confirm and explain the unexpected (and unexplained) effect of model implosion while producing new, unforeseen insights. Our work provides a useful tool for studying poisoning attacks against diffusion models and their defenses.
Paper Structure (27 sections, 1 theorem, 19 equations, 12 figures, 6 tables)

This paper contains 27 sections, 1 theorem, 19 equations, 12 figures, 6 tables.

Key Result

Theorem 4.2

When benign samples of different concepts are well-separated in both visual and textual embedding spaces, there exists a configuration of the poisoned training data $\mathcal{T}$ such that AD increases with $C_P$, the number of poisoned concepts in $\mathcal{T}$.

Figures (12)

  • Figure 1: Training pipeline of latent diffusion models.
  • Figure 2: Poisoning a single concept: images generated by "a photo of bird" and their cross-attention maps with respect to "bird": (a) benign model, (b) model where fish is poisoned to bicycle, and (c) model where bird is poisoned to chandelier.
  • Figure 3: Impact of model implosion on unpoisoned concepts -- images and cross-attention maps generated for seashell and balloon as the models are poisoned with an increasing number of concepts.
  • Figure 4: Model implosion in different training scenarios -- images and cross-attention maps generated for apple (poisoned to hat) and turtle (unpoisoned).
  • Figure 5: Illustration of $\mathcal{G}_{img}$ and $\mathcal{G}_{txt}$ and subgraphs $\mathcal{G}_{txt}^{\mathcal{T}}$, $\mathcal{G}_{img}^{\mathcal{T}}$ from the labeled training data $\mathcal{T}$. Each vertex represents an embedding. Each edge represents high similarity between vertices. Low similarity edges are omitted.
  • ...and 7 more figures

Theorems & Definitions (3)

  • Conjecture 4.1: Effectiveness of Poisoning a Single Concept
  • Theorem 4.2: Benefits of Poisoning More Concepts
  • Conjecture 4.3: Model Implosion