The Path to Reconciling Quality and Safety in Text-to-Image Generation: Dataset, Method, and Evaluation
Shouwei Ruan, Zhenyu Wu, Yao Huang, Ruochen Zhang, Yitong Sun, Caixin Kang, Shiji Zhao, Xingxing Wei
TL;DR
This work tackles the enduring safety-quality trade-off in text-to-image generation by introducing a unified framework that integrates data, method, and evaluation. It builds LibraAlign-100K, the first large-scale dataset with dual safety and quality annotations, and derives LibraAlign-HF via safety-aware inpainting to produce high-fidelity safe examples, all underpinned by a Safety Cost Model. The core methodological contribution, Synergistic Preference Optimization (T2I-SPO), combines a quality reward with a safety penalty into a composite objective and supplements training with a Dynamic Focusing Mechanism to emphasize hard cases. Evaluation is unified through the Unified Alignment Score (UAScore), which jointly considers sample quality and safety, enabling fair comparisons across safety-alignment methods. Empirically, T2I-SPO achieves state-of-the-art safety across diverse NSFW concepts while preserving generation quality, and demonstrates robustness to adversarial prompts, establishing a principled baseline for safe and creative T2I deployment.
Abstract
Content safety is a fundamental challenge for text-to-image (T2I) models, yet prevailing methods enforce a debilitating trade-off between safety and generation quality. We argue that mitigating this trade-off hinges on addressing systemic challenges in current T2I safety alignment across data, methods, and evaluation protocols. To this end, we introduce a unified framework for synergistic safety alignment. First, to overcome the flawed data paradigm that provides biased optimization signals, we develop LibraAlign-100K, the first large-scale dataset with dual annotations for safety and quality. Second, to address the myopic optimization of existing methods focus solely on safety reward, we propose Synergistic Preference Optimization (T2I-SPO), a novel alignment algorithm that extends the DPO paradigm with a composite reward function that integrates generation safety and quality to holistically model user preferences. Finally, to overcome the limitations of quality-agnostic and binary evaluation in current protocols, we introduce the Unified Alignment Score, a holistic, fine-grained metric that fairly quantifies the balance between safety and generative capability. Extensive experiments demonstrate that T2I-SPO achieves state-of-the-art safety alignment against a wide range of NSFW concepts, while better maintaining the model's generation quality and general capability
