DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge
TL;DR
DreamDistribution learns a distribution of soft prompts to customize pretrained T2I diffusion models for abstract attribute-level personalization, avoiding diffusion-model fine-tuning. It models a prompt distribution with multiple learnable embeddings and uses a simple reparameterization trick and an orthogonal loss to optimize them in embedding space, enabling sampling of diverse prompts that produce varied yet coherent images. The method supports text-guided editing and controllability over diversity, and can be adapted to other tasks such as text-to-3D. Quantitative and human evaluations on diverse reference sets show improved quality and diversity compared with baselines.
Abstract
The popularization of Text-to-Image (T2I) diffusion models enables the generation of high-quality images from text descriptions. However, generating diverse customized images with reference visual attributes remains challenging. This work focuses on personalizing T2I diffusion models at a more abstract concept or category level, adapting commonalities from a set of reference images while creating new instances with sufficient variations. We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts, enabling the generation of novel images by sampling prompts from the learned distribution. These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions. We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D. Finally we demonstrate effectiveness of our approach through quantitative analysis including automatic evaluation and human assessment. Project website: https://briannlongzhao.github.io/DreamDistribution
