Table of Contents
Fetching ...

Personalized Safety Alignment for Text-to-Image Diffusion Models

Yu Lei, Jinbin Bai, Qingyu Shi, Aosong Feng, Hongcheng Gao, Xiao Zhang, Rex Ying

TL;DR

This work addresses the mismatch between rigid, one-size-fits-all safety and the diverse safety expectations of users by introducing Personalized Safety Alignment (PSA) for text-to-image diffusion models. PSA leverages a lightweight User-Cross-Attention Adapter and a large Sage dataset of 1,000 simulated user profiles to condition generation on individual safety boundaries, enabling a calibrated trade-off between safety and visual quality. Through Diffusion-DPO-inspired training, PSA outperforms static baselines and prompt-engineering approaches, achieving strong adherence to user-specific constraints while maintaining high perceptual fidelity, even for unseen users. The approach demonstrates that personalization can drive more adaptive, user-centered, and responsible generative AI, with public code, data, and models to enable broader adoption.

Abstract

Text-to-image diffusion models have revolutionized visual content generation, yet their deployment is hindered by a fundamental limitation: safety mechanisms enforce rigid, uniform standards that fail to reflect diverse user preferences shaped by age, culture, or personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that transitions generative safety from static filtration to user-conditioned adaptation. We introduce Sage, a large-scale dataset capturing diverse safety boundaries across 1,000 simulated user profiles, covering complex risks often missed by traditional datasets. By integrating these profiles via a parameter-efficient cross-attention adapter, PSA dynamically modulates generation to align with individual sensitivities. Extensive experiments demonstrate that PSA achieves a calibrated safety-quality trade-off: under permissive profiles, it relaxes over-cautious constraints to enhance visual fidelity, while under restrictive profiles, it enforces state-of-the-art suppression, significantly outperforming static baselines. Furthermore, PSA exhibits superior instruction adherence compared to prompt-engineering methods, establishing personalization as a vital direction for creating adaptive, user-centered, and responsible generative AI. Our code, data, and models are publicly available at https://github.com/M-E-AGI-Lab/PSAlign.

Personalized Safety Alignment for Text-to-Image Diffusion Models

TL;DR

This work addresses the mismatch between rigid, one-size-fits-all safety and the diverse safety expectations of users by introducing Personalized Safety Alignment (PSA) for text-to-image diffusion models. PSA leverages a lightweight User-Cross-Attention Adapter and a large Sage dataset of 1,000 simulated user profiles to condition generation on individual safety boundaries, enabling a calibrated trade-off between safety and visual quality. Through Diffusion-DPO-inspired training, PSA outperforms static baselines and prompt-engineering approaches, achieving strong adherence to user-specific constraints while maintaining high perceptual fidelity, even for unseen users. The approach demonstrates that personalization can drive more adaptive, user-centered, and responsible generative AI, with public code, data, and models to enable broader adoption.

Abstract

Text-to-image diffusion models have revolutionized visual content generation, yet their deployment is hindered by a fundamental limitation: safety mechanisms enforce rigid, uniform standards that fail to reflect diverse user preferences shaped by age, culture, or personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that transitions generative safety from static filtration to user-conditioned adaptation. We introduce Sage, a large-scale dataset capturing diverse safety boundaries across 1,000 simulated user profiles, covering complex risks often missed by traditional datasets. By integrating these profiles via a parameter-efficient cross-attention adapter, PSA dynamically modulates generation to align with individual sensitivities. Extensive experiments demonstrate that PSA achieves a calibrated safety-quality trade-off: under permissive profiles, it relaxes over-cautious constraints to enhance visual fidelity, while under restrictive profiles, it enforces state-of-the-art suppression, significantly outperforming static baselines. Furthermore, PSA exhibits superior instruction adherence compared to prompt-engineering methods, establishing personalization as a vital direction for creating adaptive, user-centered, and responsible generative AI. Our code, data, and models are publicly available at https://github.com/M-E-AGI-Lab/PSAlign.

Paper Structure

This paper contains 66 sections, 12 equations, 12 figures, 16 tables.

Figures (12)

  • Figure 1: The overview of PSA. PSA adapts text-to-image generation to individual user safety preferences by conditioning the model on user-specific profiles (Profile 1–3). In contrast to traditional one-size-fits-all methods that apply uniform suppression, PSA tailors safety alignment to each user's unique boundaries.
  • Figure 2: Visualizing Safety Diversity. t-SNE projection of 1,000 simulated user embeddings. The distinct clusters correspond to different safety archetypes, ranging from permissive to restrictive.
  • Figure 3: Sage Construction Pipeline. An adversarial prompt ($p^h$) and a safe rewrite ($p^s$) are generated for a concept. The resulting image pair $(x_0^s, x_0^h)$ is dynamically labeled as preferred/dispreferred based on the user profile.
  • Figure 4: The PSA Training Pipeline. (1) User profiles are used to create user-specific preference pairs $(x_0^+, x_0^-)$ based on our Sage dataset's logic (Eq. \ref{['eq:pref_pair']}). Based on the profile, banned concepts (e.g., Violence) become the negative sample, while allowed concepts (e.g., Self-Harm, for this user) become the positive sample. (2) A lightweight, trainable adapter injects the corresponding user embedding into the frozen cross-attention layers of the Denoising U-Net. (3) This adapter is then optimized by minimizing our proposed $\mathcal{L}_{\text{PSA}}$ to align the model's output with each user's unique safety boundaries.
  • Figure 5: Qualitative Comparison of Harmful Content Suppression on SDXL. Please refer to Appendix \ref{['app:qual_comp_setup']} for the corresponding prompts.
  • ...and 7 more figures