Table of Contents
Fetching ...

SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation

Sumin Yu, Taesup Moon

TL;DR

SP-Guard tackles unsafe content generation in text-to-image diffusion by introducing prompt-adaptive and region-selective guidance. It estimates prompt harmfulness via cosine similarity between the prompt-induced noise direction and unsafe concept directions and applies a per-timestep mask to suppress only the unsafe regions, preserving benign content. Empirical results across four unsafe datasets show SP-Guard delivers strong safety gains with minimal image fidelity loss, outperforming prior inference-time methods in content preservation and controllability. The approach enhances trustworthiness of generative AI and offers a practical path toward adaptable safety in multimodal systems and beyond.

Abstract

While diffusion-based T2I models have achieved remarkable image generation quality, they also enable easy creation of harmful content, raising social concerns and highlighting the need for safer generation. Existing inference-time guiding methods lack both adaptivity--adjusting guidance strength based on the prompt--and selectivity--targeting only unsafe regions of the image. Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask to guide only unsafe areas. Experiments show that SP-Guard generates safer images than existing methods while minimizing unintended content alteration. Beyond improving safety, our findings highlight the importance of transparency and controllability in image generation.

SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation

TL;DR

SP-Guard tackles unsafe content generation in text-to-image diffusion by introducing prompt-adaptive and region-selective guidance. It estimates prompt harmfulness via cosine similarity between the prompt-induced noise direction and unsafe concept directions and applies a per-timestep mask to suppress only the unsafe regions, preserving benign content. Empirical results across four unsafe datasets show SP-Guard delivers strong safety gains with minimal image fidelity loss, outperforming prior inference-time methods in content preservation and controllability. The approach enhances trustworthiness of generative AI and offers a practical path toward adaptable safety in multimodal systems and beyond.

Abstract

While diffusion-based T2I models have achieved remarkable image generation quality, they also enable easy creation of harmful content, raising social concerns and highlighting the need for safer generation. Existing inference-time guiding methods lack both adaptivity--adjusting guidance strength based on the prompt--and selectivity--targeting only unsafe regions of the image. Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask to guide only unsafe areas. Experiments show that SP-Guard generates safer images than existing methods while minimizing unintended content alteration. Beyond improving safety, our findings highlight the importance of transparency and controllability in image generation.

Paper Structure

This paper contains 10 sections, 8 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Limitations of SLD in safety guidance. See \ref{['sec:Safety-Considerations']} for details.
  • Figure 2: Guiding process of SP-Guard.
  • Figure 3: Qualitative comparison of methods for removing inappropriate content.
  • Figure 4: Trade-off between safety improvement and content preservation. Points further to the upper right indicate safer image generation with better content preservation.
  • Figure 5: Evaluation of $\lambda(t)$ variations. This plot demonstrates the impact of varying $\lambda$ of SP-Guard.