Table of Contents
Fetching ...

Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization

Xingqi Wang, Xiaoyuan Yi, Xing Xie, Jia Jia

TL;DR

Without updating most model parameters and through adaptive value selection from the input prompt, LiVO significantly reduces harmful outputs and achieves faster convergence, surpassing several strong baselines and taking an initial step towards ethically aligned T2I models.

Abstract

Recent advancements in diffusion models trained on large-scale data have enabled the generation of indistinguishable human-level images, yet they often produce harmful content misaligned with human values, e.g., social bias, and offensive content. Despite extensive research on Large Language Models (LLMs), the challenge of Text-to-Image (T2I) model alignment remains largely unexplored. Addressing this problem, we propose LiVO (Lightweight Value Optimization), a novel lightweight method for aligning T2I models with human values. LiVO only optimizes a plug-and-play value encoder to integrate a specified value principle with the input prompt, allowing the control of generated images over both semantics and values. Specifically, we design a diffusion model-tailored preference optimization loss, which theoretically approximates the Bradley-Terry model used in LLM alignment but provides a more flexible trade-off between image quality and value conformity. To optimize the value encoder, we also develop a framework to automatically construct a text-image preference dataset of 86k (prompt, aligned image, violating image, value principle) samples. Without updating most model parameters and through adaptive value selection from the input prompt, LiVO significantly reduces harmful outputs and achieves faster convergence, surpassing several strong baselines and taking an initial step towards ethically aligned T2I models.

Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization

TL;DR

Without updating most model parameters and through adaptive value selection from the input prompt, LiVO significantly reduces harmful outputs and achieves faster convergence, surpassing several strong baselines and taking an initial step towards ethically aligned T2I models.

Abstract

Recent advancements in diffusion models trained on large-scale data have enabled the generation of indistinguishable human-level images, yet they often produce harmful content misaligned with human values, e.g., social bias, and offensive content. Despite extensive research on Large Language Models (LLMs), the challenge of Text-to-Image (T2I) model alignment remains largely unexplored. Addressing this problem, we propose LiVO (Lightweight Value Optimization), a novel lightweight method for aligning T2I models with human values. LiVO only optimizes a plug-and-play value encoder to integrate a specified value principle with the input prompt, allowing the control of generated images over both semantics and values. Specifically, we design a diffusion model-tailored preference optimization loss, which theoretically approximates the Bradley-Terry model used in LLM alignment but provides a more flexible trade-off between image quality and value conformity. To optimize the value encoder, we also develop a framework to automatically construct a text-image preference dataset of 86k (prompt, aligned image, violating image, value principle) samples. Without updating most model parameters and through adaptive value selection from the input prompt, LiVO significantly reduces harmful outputs and achieves faster convergence, surpassing several strong baselines and taking an initial step towards ethically aligned T2I models.

Paper Structure

This paper contains 29 sections, 20 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: (a) Biased images produced by DALL·E 2. (b) Pornographic ones by Stable Diffusion. Sensitive content is masked. (c) LLMs can follow inputted value principles (marked in blue) and reduce harmfulness while T2I models cannot.
  • Figure 2: Illustration of LiVO. For each prompt $\mathbf{x}$, LiVO retrieves a related value principle which is then mapped into embedding by the value encoder $E_{\theta}^v(\mathbf{x})$ to steer the generation direction. The value encoder is trained on paired preference images.
  • Figure 3: Further analysis on (a) data efficiency; the trade-off between (b) social bias / (c) toxicity and image quality. Each tuple indicates a setting of $(\gamma_1,\gamma_2)$. UCE and DPO are omitted due to their bad results. Pareto frontiers are marked in dashed lines.
  • Figure 4: Training convergence. We show bias and toxicity scores evaluated in the test set with varied training steps.
  • Figure 5: Case study on debiasing (upper) and detoxification (bottom). We present images generated by SD, FD, UCE, CA, and LiVO. The images depicting males are highlighted in dark cyan, while those depicting females are in pink. The images depicting toxic content are highlighted in red and highly sensitive images are mosaicked to reduce the offensiveness. Overall, our LiVO achieves perfectly balanced attributes, the least toxicity information, and minimal image quality degradation.
  • ...and 8 more figures