SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation
Sumin Yu, Taesup Moon
TL;DR
SP-Guard tackles unsafe content generation in text-to-image diffusion by introducing prompt-adaptive and region-selective guidance. It estimates prompt harmfulness via cosine similarity between the prompt-induced noise direction and unsafe concept directions and applies a per-timestep mask to suppress only the unsafe regions, preserving benign content. Empirical results across four unsafe datasets show SP-Guard delivers strong safety gains with minimal image fidelity loss, outperforming prior inference-time methods in content preservation and controllability. The approach enhances trustworthiness of generative AI and offers a practical path toward adaptable safety in multimodal systems and beyond.
Abstract
While diffusion-based T2I models have achieved remarkable image generation quality, they also enable easy creation of harmful content, raising social concerns and highlighting the need for safer generation. Existing inference-time guiding methods lack both adaptivity--adjusting guidance strength based on the prompt--and selectivity--targeting only unsafe regions of the image. Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask to guide only unsafe areas. Experiments show that SP-Guard generates safer images than existing methods while minimizing unintended content alteration. Beyond improving safety, our findings highlight the importance of transparency and controllability in image generation.
