DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation

Brian Nlong Zhao; Yuhang Xiao; Jiashu Xu; Xinyang Jiang; Yifan Yang; Dongsheng Li; Laurent Itti; Vibhav Vineet; Yunhao Ge

DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation

Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge

TL;DR

DreamDistribution learns a distribution of soft prompts to customize pretrained T2I diffusion models for abstract attribute-level personalization, avoiding diffusion-model fine-tuning. It models a prompt distribution with multiple learnable embeddings and uses a simple reparameterization trick and an orthogonal loss to optimize them in embedding space, enabling sampling of diverse prompts that produce varied yet coherent images. The method supports text-guided editing and controllability over diversity, and can be adapted to other tasks such as text-to-3D. Quantitative and human evaluations on diverse reference sets show improved quality and diversity compared with baselines.

Abstract

The popularization of Text-to-Image (T2I) diffusion models enables the generation of high-quality images from text descriptions. However, generating diverse customized images with reference visual attributes remains challenging. This work focuses on personalizing T2I diffusion models at a more abstract concept or category level, adapting commonalities from a set of reference images while creating new instances with sufficient variations. We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts, enabling the generation of novel images by sampling prompts from the learned distribution. These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions. We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D. Finally we demonstrate effectiveness of our approach through quantitative analysis including automatic evaluation and human assessment. Project website: https://briannlongzhao.github.io/DreamDistribution

DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation

TL;DR

Abstract

Paper Structure (22 sections, 19 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 19 figures, 5 tables, 1 algorithm.

Introduction
Acknowledgment
More Implementation Details
Evaluation Set
Additional Result
Diverse Image Instance Generation
Text-guided Editing
Scaling Variance for Diversity Control
Composition of Distribution
Same Instance Personalization
Naive Adaptation of Baselines
Exploring with Different Granularity of Concepts
Experiment with Small Reference Set
3D Generation Diversity
Ablation study
...and 7 more sections

Figures (19)

Figure 1: In general, with more prompts, the performance increases in terms of both quality and diversity.
Figure 2: Choice of $\lambda$ value between $1\times10^{-4}$ and $1\times10^{-2}$ generally achieves good balance of quantitative metrics.
Figure 3: Similar to number of prompts, increasing number of prompt tokens also shows increasing image quality and diversity.
Figure 4: Samples of reference images from our evaluation set. Numbers on the right represent the number of images in each set.
Figure 5: Samples of generated image results using reference images from the evaluation set. Each row is generated using reference images of the corresponding row in \ref{['fig:eval_set']}.
...and 14 more figures

DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation

TL;DR

Abstract

DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (19)