Table of Contents
Fetching ...

ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation

Yi Zhang, Yun Tang, Wenjie Ruan, Xiaowei Huang, Siddartha Khastgir, Paul Jennings, Xingyu Zhao

TL;DR

This work formalizes probabilistic robustness for text-to-image diffusion models under stochastic perturbations and introduces ProTIP, an efficient verification framework that uses sequential analysis and adaptive concentration bounds to provide statistical guarantees on a model's robustness. By encoding perturbations semantically with CLIP-based similarity and evaluating distributional invariance via two-sample tests, ProTIP identifies adversarial examples with reduced sample complexity. It delivers practical contributions including runtime-efficient AE testing, lower-bound robustness guarantees, and a method to rank defense strategies against text perturbations. The approach is validated on multiple Stable Diffusion variants with COCO prompts, demonstrating both accuracy in robustness assessment and actionable insights for defense selection, alongside an open-source repository for replication.

Abstract

Text-to-Image (T2I) Diffusion Models (DMs) have shown impressive abilities in generating high-quality images based on simple text descriptions. However, as is common with many Deep Learning (DL) models, DMs are subject to a lack of robustness. While there are attempts to evaluate the robustness of T2I DMs as a binary or worst-case problem, they cannot answer how robust in general the model is whenever an adversarial example (AE) can be found. In this study, we first introduce a probabilistic notion of T2I DMs' robustness; and then establish an efficient framework, ProTIP, to evaluate it with statistical guarantees. The main challenges stem from: i) the high computational cost of the generation process; and ii) determining if a perturbed input is an AE involves comparing two output distributions, which is fundamentally harder compared to other DL tasks like classification where an AE is identified upon misprediction of labels. To tackle the challenges, we employ sequential analysis with efficacy and futility early stopping rules in the statistical testing for identifying AEs, and adaptive concentration inequalities to dynamically determine the "just-right" number of stochastic perturbations whenever the verification target is met. Empirical experiments validate the effectiveness and efficiency of ProTIP over common T2I DMs. Finally, we demonstrate an application of ProTIP to rank commonly used defence methods.

ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation

TL;DR

This work formalizes probabilistic robustness for text-to-image diffusion models under stochastic perturbations and introduces ProTIP, an efficient verification framework that uses sequential analysis and adaptive concentration bounds to provide statistical guarantees on a model's robustness. By encoding perturbations semantically with CLIP-based similarity and evaluating distributional invariance via two-sample tests, ProTIP identifies adversarial examples with reduced sample complexity. It delivers practical contributions including runtime-efficient AE testing, lower-bound robustness guarantees, and a method to rank defense strategies against text perturbations. The approach is validated on multiple Stable Diffusion variants with COCO prompts, demonstrating both accuracy in robustness assessment and actionable insights for defense selection, alongside an open-source repository for replication.

Abstract

Text-to-Image (T2I) Diffusion Models (DMs) have shown impressive abilities in generating high-quality images based on simple text descriptions. However, as is common with many Deep Learning (DL) models, DMs are subject to a lack of robustness. While there are attempts to evaluate the robustness of T2I DMs as a binary or worst-case problem, they cannot answer how robust in general the model is whenever an adversarial example (AE) can be found. In this study, we first introduce a probabilistic notion of T2I DMs' robustness; and then establish an efficient framework, ProTIP, to evaluate it with statistical guarantees. The main challenges stem from: i) the high computational cost of the generation process; and ii) determining if a perturbed input is an AE involves comparing two output distributions, which is fundamentally harder compared to other DL tasks like classification where an AE is identified upon misprediction of labels. To tackle the challenges, we employ sequential analysis with efficacy and futility early stopping rules in the statistical testing for identifying AEs, and adaptive concentration inequalities to dynamically determine the "just-right" number of stochastic perturbations whenever the verification target is met. Empirical experiments validate the effectiveness and efficiency of ProTIP over common T2I DMs. Finally, we demonstrate an application of ProTIP to rank commonly used defence methods.
Paper Structure (25 sections, 4 theorems, 13 equations, 15 figures, 3 tables)

This paper contains 25 sections, 4 theorems, 13 equations, 15 figures, 3 tables.

Key Result

theorem thmcountertheorem

We know $I(x'_i)$ is a binary 0--1 random variable. Let $\hat{\mu}_I^{(n)}=\frac{1}{n}\sum_{i=1}^{n} I(x'_i)$ (i.e., the sample mean). Also let $J$ be a random variable on $\mathbb{N} \cup \{\infty\}$, and $\varepsilon(\sigma, n) = \sqrt{\frac{0.6 \cdot \log(\log_{1.1}n+1) + 1.8^{-1} \cdot \log\left where $R_M$ is the true population mean of $I(x'_i)$, and $\sigma$ is a given confidence level.

Figures (15)

  • Figure 1: Examples illustrating perturbations applied to the prompt for Stable Diffusion.
  • Figure 2: Four common formulations of robustness verification in DL---binary (a), worst-case (b & c), and probabilistic (d) robustness.
  • Figure 3: Workflow of ProTIP.
  • Figure 4: ProTIP results for a given prompt, with different confidence levels $1-\sigma$ (a)(b); Mean & Std. (shared area) of RLB over 107 prompts (c).
  • Figure 5: (a) Number of perturbations (out of 400) identified as Non-AEs/AEs at each interim stage (S) in the sequential hypothesis testing. (b) ProTIP with the (adaptive) sample size of 78 vs. Hoeffding's inequality with fixed sample size 50, 100, and 200.
  • ...and 10 more figures

Theorems & Definitions (9)

  • definition thmcounterdefinition: Probabilistic Robustness
  • definition thmcounterdefinition: Probabilistic Robustness of T2I DMs
  • definition thmcounterdefinition: Verification Target
  • theorem thmcountertheorem: Adaptive Hoeffding's Inequality
  • theorem thmcountertheorem: Hoeffding's Inequality
  • corollary thmcountercorollary: Tightness of the two bounds
  • proof
  • corollary thmcountercorollary: Monotonicity to $n$ and $\sigma$
  • proof