Table of Contents
Fetching ...

Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation

Yi Liu, Guowei Yang, Gelei Deng, Feiyue Chen, Yuqi Chen, Ling Shi, Tianwei Zhang, Yang Liu

TL;DR

Groot addresses the safety risks of NSFW content generation in text-to-image models by introducing an automated adversarial testing framework that uses tree-based semantic transformation. It integrates a Prompt Parse Tree (PPT) to semantically decompose prompts and a Sensitive Element Drowning strategy to overwhelm image safety filters, guided by LLMs for goal-oriented refinement. Evaluations across DALL·E 3, Midjourney, and Stable Diffusion XL show Groot achieving a 93.66% success rate, substantially outperforming baselines such as SneakyPrompt. The work provides open-source code and datasets, offering a scalable, reusable approach for safety evaluation and attack-surface mapping in multimodal generation systems.

Abstract

With the prevalence of text-to-image generative models, their safety becomes a critical concern. adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content. However, existing solutions face several challenges, including low success rate and inefficiency. We introduce Groot, the first automated framework leveraging tree-based semantic transformation for adversarial testing of text-to-image models. Groot employs semantic decomposition and sensitive element drowning strategies in conjunction with LLMs to systematically refine adversarial prompts. Our comprehensive evaluation confirms the efficacy of Groot, which not only exceeds the performance of current state-of-the-art approaches but also achieves a remarkable success rate (93.66%) on leading text-to-image models such as DALL-E 3 and Midjourney.

Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation

TL;DR

Groot addresses the safety risks of NSFW content generation in text-to-image models by introducing an automated adversarial testing framework that uses tree-based semantic transformation. It integrates a Prompt Parse Tree (PPT) to semantically decompose prompts and a Sensitive Element Drowning strategy to overwhelm image safety filters, guided by LLMs for goal-oriented refinement. Evaluations across DALL·E 3, Midjourney, and Stable Diffusion XL show Groot achieving a 93.66% success rate, substantially outperforming baselines such as SneakyPrompt. The work provides open-source code and datasets, offering a scalable, reusable approach for safety evaluation and attack-surface mapping in multimodal generation systems.

Abstract

With the prevalence of text-to-image generative models, their safety becomes a critical concern. adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content. However, existing solutions face several challenges, including low success rate and inefficiency. We introduce Groot, the first automated framework leveraging tree-based semantic transformation for adversarial testing of text-to-image models. Groot employs semantic decomposition and sensitive element drowning strategies in conjunction with LLMs to systematically refine adversarial prompts. Our comprehensive evaluation confirms the efficacy of Groot, which not only exceeds the performance of current state-of-the-art approaches but also achieves a remarkable success rate (93.66%) on leading text-to-image models such as DALL-E 3 and Midjourney.
Paper Structure (21 sections, 9 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Image Generation Process of DALL·E 3.
  • Figure 2: Motivation example demonstrating the operation of text and image safety filters in DALL-E 3, filtering out safe and adversarial content through different stages of the model's processing pipeline.
  • Figure 3: The workflow of Groot.
  • Figure 4: Hierarchical parsing of prompts in PPT. (a) Basic prompt with object nodes. (b) Addition of setting via 'Contain' node. (c) Inclusion of attribute nodes for detailed context.
  • Figure 5: Comparison of PPT representations showing (a) an initial prompt with the word 'blood' being blocked by a text safety filter, and (b) the refined prompt using 'red liquid' to bypass the filter through semantic decomposition.
  • ...and 4 more figures