Table of Contents
Fetching ...

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

Weifeng Chen, Jiacheng Zhang, Jie Wu, Hefeng Wu, Xuefeng Xiao, Liang Lin

TL;DR

ID-Aligner introduces a universal reward-feedback framework for identity-preserving text-to-image generation, addressing identity fidelity and visual aesthetics through two targeted rewards: identity consistency and identity aesthetics. By leveraging face-detection/recognition feedback and human-preference data, it provides a unified objective that can be applied to both LoRA and Adapter diffusion models, yielding consistent performance gains. The approach demonstrates superior identity preservation and aesthetics on SD1.5 and SDXL, with strong generalization to alternative base models and favorable user feedback. This framework enables more flexible and scalable ID-T2I customization, with potential applications in AI portraits, advertising, and other identity-driven image synthesis tasks.

Abstract

The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity characteristics of reference portraits accurately, (2) The generated images lack aesthetic appeal especially while enforcing identity retention, and (3) There is a limitation that cannot be compatible with LoRA-based and Adapter-based methods simultaneously. To address these issues, we present \textbf{ID-Aligner}, a general feedback learning framework to enhance ID-T2I performance. To resolve identity features lost, we introduce identity consistency reward fine-tuning to utilize the feedback from face detection and recognition models to improve generated identity preservation. Furthermore, we propose identity aesthetic reward fine-tuning leveraging rewards from human-annotated preference data and automatically constructed feedback on character structure generation to provide aesthetic tuning signals. Thanks to its universal feedback fine-tuning framework, our method can be readily applied to both LoRA and Adapter models, achieving consistent performance gains. Extensive experiments on SD1.5 and SDXL diffusion models validate the effectiveness of our approach. \textbf{Project Page: \url{https://idaligner.github.io/}}

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

TL;DR

ID-Aligner introduces a universal reward-feedback framework for identity-preserving text-to-image generation, addressing identity fidelity and visual aesthetics through two targeted rewards: identity consistency and identity aesthetics. By leveraging face-detection/recognition feedback and human-preference data, it provides a unified objective that can be applied to both LoRA and Adapter diffusion models, yielding consistent performance gains. The approach demonstrates superior identity preservation and aesthetics on SD1.5 and SDXL, with strong generalization to alternative base models and favorable user feedback. This framework enables more flexible and scalable ID-T2I customization, with potential applications in AI portraits, advertising, and other identity-driven image synthesis tasks.

Abstract

The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity characteristics of reference portraits accurately, (2) The generated images lack aesthetic appeal especially while enforcing identity retention, and (3) There is a limitation that cannot be compatible with LoRA-based and Adapter-based methods simultaneously. To address these issues, we present \textbf{ID-Aligner}, a general feedback learning framework to enhance ID-T2I performance. To resolve identity features lost, we introduce identity consistency reward fine-tuning to utilize the feedback from face detection and recognition models to improve generated identity preservation. Furthermore, we propose identity aesthetic reward fine-tuning leveraging rewards from human-annotated preference data and automatically constructed feedback on character structure generation to provide aesthetic tuning signals. Thanks to its universal feedback fine-tuning framework, our method can be readily applied to both LoRA and Adapter models, achieving consistent performance gains. Extensive experiments on SD1.5 and SDXL diffusion models validate the effectiveness of our approach. \textbf{Project Page: \url{https://idaligner.github.io/}}
Paper Structure (19 sections, 8 equations, 10 figures, 4 tables, 2 algorithms)

This paper contains 19 sections, 8 equations, 10 figures, 4 tables, 2 algorithms.

Figures (10)

  • Figure 1: The overview of the proposed ID-Aligner. Our method exploits face detection and face encoder to achieve identity preservation via feedback learning. We further incorporated the aesthetic reward model to improve the visual appeal of the generation results. Our method is a general framework that can be applied to both LoRA and Adapter methods.
  • Figure 2: The illustration of the aesthetic feedback data construction. We take an "AI + Expert" way to generate the feedback data. Left: The automatic data construction for the feedback data on the character structure generation. We resort to ControlNet controlnet to manually generate the structure-distorted negative samples. Right: Human annoatated preference data over images.
  • Figure 3: Visual comparison of different Adapter-based identity conditional generation methods based on SD15 and SDXL.
  • Figure 4: Visual results of LoRA ID-Aligner methods based on SDXL.
  • Figure 5: The effectiveness ablation of the proposed identity consistency reward (ID-Cons) and the identity aesthetic reward (ID-Aes).
  • ...and 5 more figures