SafeEditor: Unified MLLM for Efficient Post-hoc T2I Safety Editing

Ruiyang Zhang; Jiahao Luo; Xiaoru Feng; Qiufan Pang; Yaodong Yang; Juntao Dai

SafeEditor: Unified MLLM for Efficient Post-hoc T2I Safety Editing

Ruiyang Zhang, Jiahao Luo, Xiaoru Feng, Qiufan Pang, Yaodong Yang, Juntao Dai

TL;DR

This work addresses safety in text-to-image generation by moving beyond pre- or post-prompt filtering to a post-hoc safety editing paradigm. It introduces MR-SafeEdit, a large, multi-round image-text interleaved dataset, and SafeEditor, a unified multimodal LLM trained to iteratively edit unsafe generations while preserving user intent. The approach reduces over-refusal and achieves a favorable safety-utility balance, demonstrated across multiple datasets and generation models, and it remains model-agnostic, functioning as a plug-in at the output stage. The contributions include the dataset, the SafeEditor model, comprehensive experiments and ablations, and a discussion of limitations and directions for future work in safety alignment for multi-modal generation.

Abstract

With the rapid advancement of text-to-image (T2I) models, ensuring their safety has become increasingly critical. Existing safety approaches can be categorized into training-time and inference-time methods. While inference-time methods are widely adopted due to their cost-effectiveness, they often suffer from limitations such as over-refusal and imbalance between safety and utility. To address these challenges, we propose a multi-round safety editing framework that functions as a model-agnostic, plug-and-play module, enabling efficient safety alignment for any text-to-image model. Central to this framework is MR-SafeEdit, a multi-round image-text interleaved dataset specifically constructed for safety editing in text-to-image generation. We introduce a post-hoc safety editing paradigm that mirrors the human cognitive process of identifying and refining unsafe content. To instantiate this paradigm, we develop SafeEditor, a unified MLLM capable of multi-round safety editing on generated images. Experimental results show that SafeEditor surpasses prior safety approaches by reducing over-refusal while achieving a more favorable safety-utility balance.

SafeEditor: Unified MLLM for Efficient Post-hoc T2I Safety Editing

TL;DR

Abstract

SafeEditor: Unified MLLM for Efficient Post-hoc T2I Safety Editing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (23)