Personalized Safety Alignment for Text-to-Image Diffusion Models
Yu Lei, Jinbin Bai, Qingyu Shi, Aosong Feng, Hongcheng Gao, Xiao Zhang, Rex Ying
TL;DR
This work addresses the mismatch between rigid, one-size-fits-all safety and the diverse safety expectations of users by introducing Personalized Safety Alignment (PSA) for text-to-image diffusion models. PSA leverages a lightweight User-Cross-Attention Adapter and a large Sage dataset of 1,000 simulated user profiles to condition generation on individual safety boundaries, enabling a calibrated trade-off between safety and visual quality. Through Diffusion-DPO-inspired training, PSA outperforms static baselines and prompt-engineering approaches, achieving strong adherence to user-specific constraints while maintaining high perceptual fidelity, even for unseen users. The approach demonstrates that personalization can drive more adaptive, user-centered, and responsible generative AI, with public code, data, and models to enable broader adoption.
Abstract
Text-to-image diffusion models have revolutionized visual content generation, yet their deployment is hindered by a fundamental limitation: safety mechanisms enforce rigid, uniform standards that fail to reflect diverse user preferences shaped by age, culture, or personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that transitions generative safety from static filtration to user-conditioned adaptation. We introduce Sage, a large-scale dataset capturing diverse safety boundaries across 1,000 simulated user profiles, covering complex risks often missed by traditional datasets. By integrating these profiles via a parameter-efficient cross-attention adapter, PSA dynamically modulates generation to align with individual sensitivities. Extensive experiments demonstrate that PSA achieves a calibrated safety-quality trade-off: under permissive profiles, it relaxes over-cautious constraints to enhance visual fidelity, while under restrictive profiles, it enforces state-of-the-art suppression, significantly outperforming static baselines. Furthermore, PSA exhibits superior instruction adherence compared to prompt-engineering methods, establishing personalization as a vital direction for creating adaptive, user-centered, and responsible generative AI. Our code, data, and models are publicly available at https://github.com/M-E-AGI-Lab/PSAlign.
