Table of Contents
Fetching ...

Discovering Universal Semantic Triggers for Text-to-Image Synthesis

Shengfang Zhai, Weilong Wang, Jiajun Li, Yinpeng Dong, Hang Su, Qingni Shen

TL;DR

Universal Semantic Triggers reveal a latent vulnerability in text-to-image synthesis by showing that meaningless token sequences inserted into prompts can steer generation toward predefined semantics. The authors introduce Semantic Gradient-based Search (SGS) to automate trigger discovery and SemSR to quantify semantic shifts within CLIP embedding space, including a clear formulation for SemSR. Experimental results on Stable Diffusion and real-world SaaS platforms demonstrate broad susceptibility across target semantics, with longer triggers and certain insertion positions yielding stronger effects, underscoring the need for automated pre-deployment auditing. This work highlights ethical and safety implications and provides a framework to systematically audit and mitigate hidden semantic triggers in diffusion-based generation systems.

Abstract

Recently text-to-image models have gained widespread attention in the community due to their controllable and high-quality generation ability. However, the robustness of such models and their potential ethical issues have not been fully explored. In this paper, we introduce Universal Semantic Trigger, a meaningless token sequence that can be added at any location within the input text yet can induce generated images towards a preset semantic target.To thoroughly investigate it, we propose Semantic Gradient-based Search (SGS) framework. SGS automatically discovers the potential universal semantic triggers based on the given semantic targets. Furthermore, we design evaluation metrics to comprehensively evaluate semantic shift of images caused by these triggers. And our empirical analyses reveal that the mainstream open-source text-to-image models are vulnerable to our triggers, which could pose significant ethical threats. Our work contributes to a further understanding of text-to-image synthesis and helps users to automatically auditing their models before deployment.

Discovering Universal Semantic Triggers for Text-to-Image Synthesis

TL;DR

Universal Semantic Triggers reveal a latent vulnerability in text-to-image synthesis by showing that meaningless token sequences inserted into prompts can steer generation toward predefined semantics. The authors introduce Semantic Gradient-based Search (SGS) to automate trigger discovery and SemSR to quantify semantic shifts within CLIP embedding space, including a clear formulation for SemSR. Experimental results on Stable Diffusion and real-world SaaS platforms demonstrate broad susceptibility across target semantics, with longer triggers and certain insertion positions yielding stronger effects, underscoring the need for automated pre-deployment auditing. This work highlights ethical and safety implications and provides a framework to systematically audit and mitigate hidden semantic triggers in diffusion-based generation systems.

Abstract

Recently text-to-image models have gained widespread attention in the community due to their controllable and high-quality generation ability. However, the robustness of such models and their potential ethical issues have not been fully explored. In this paper, we introduce Universal Semantic Trigger, a meaningless token sequence that can be added at any location within the input text yet can induce generated images towards a preset semantic target.To thoroughly investigate it, we propose Semantic Gradient-based Search (SGS) framework. SGS automatically discovers the potential universal semantic triggers based on the given semantic targets. Furthermore, we design evaluation metrics to comprehensively evaluate semantic shift of images caused by these triggers. And our empirical analyses reveal that the mainstream open-source text-to-image models are vulnerable to our triggers, which could pose significant ethical threats. Our work contributes to a further understanding of text-to-image synthesis and helps users to automatically auditing their models before deployment.
Paper Structure (21 sections, 9 equations, 5 figures, 4 tables)

This paper contains 21 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Illustration of SGS framework. SGS adopts gradient-based search method. We firstly construct explicit semantic sentence for training data, and then optimize trigger tokens by encouraging trigger-inserted text embedding close to the embedding of explicit semantic sentence. The universal triggers triggers generated by SGS are diverse and input-agnostic.
  • Figure 2: The impact of insertion postion on the effectivenss of our triggers. The vertical axis represents the position when generating triggers, while the horizontal axis represents the position where the trigger is inserted in the text for testing.
  • Figure 3: Under the black setting, the examples of our triggers on three versions of Midjourney. We use red box to highlight the examples that are triggered successfully.
  • Figure 4: Examples of ensemble triggers, which indicate the potential significant threats of universal semantic triggers, such as an efficient creation of malicious memesqu2023unsafe.
  • Figure 5: Examples of universal semantic triggers on other text-to-image models.