InstructBooth: Instruction-following Personalized Text-to-Image Generation
Daewon Chae, Nokyung Park, Jinkyu Kim, Kimin Lee
TL;DR
InstructBooth tackles the challenge of producing personalized text-to-image generations that faithfully reflect a user-specific subject while remaining faithful to text prompts. It couples a DreamBooth-like personalization stage using a unique subject identifier with a subsequent reinforcement learning fine-tuning stage that maximizes a text-alignment reward, mitigating overfitting and expanding contextual diversity. The approach introduces detailed subject descriptions for rare subjects and employs prompts both with and without identifiers to stabilize RL training, achieving superior text fidelity and competitive subject fidelity compared to baselines, as confirmed by human judgments and DreamBench benchmarks. This two-stage, reward-driven framework enhances the practical utility of personalized T2I models for expressive, context-rich generation, while also highlighting considerations for safe deployment and future improvements in evaluation datasets and watermarking strategies.
Abstract
Personalizing text-to-image models using a limited set of images for a specific object has been explored in subject-specific image generation. However, existing methods often face challenges in aligning with text prompts due to overfitting to the limited training images. In this work, we introduce InstructBooth, a novel method designed to enhance image-text alignment in personalized text-to-image models without sacrificing the personalization ability. Our approach first personalizes text-to-image models with a small number of subject-specific images using a unique identifier. After personalization, we fine-tune personalized text-to-image models using reinforcement learning to maximize a reward that quantifies image-text alignment. Additionally, we propose complementary techniques to increase the synergy between these two processes. Our method demonstrates superior image-text alignment compared to existing baselines, while maintaining high personalization ability. In human evaluations, InstructBooth outperforms them when considering all comprehensive factors. Our project page is at https://sites.google.com/view/instructbooth.
