Table of Contents
Fetching ...

DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation

Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge

TL;DR

DreamDistribution learns a distribution of soft prompts to customize pretrained T2I diffusion models for abstract attribute-level personalization, avoiding diffusion-model fine-tuning. It models a prompt distribution with multiple learnable embeddings and uses a simple reparameterization trick and an orthogonal loss to optimize them in embedding space, enabling sampling of diverse prompts that produce varied yet coherent images. The method supports text-guided editing and controllability over diversity, and can be adapted to other tasks such as text-to-3D. Quantitative and human evaluations on diverse reference sets show improved quality and diversity compared with baselines.

Abstract

The popularization of Text-to-Image (T2I) diffusion models enables the generation of high-quality images from text descriptions. However, generating diverse customized images with reference visual attributes remains challenging. This work focuses on personalizing T2I diffusion models at a more abstract concept or category level, adapting commonalities from a set of reference images while creating new instances with sufficient variations. We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts, enabling the generation of novel images by sampling prompts from the learned distribution. These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions. We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D. Finally we demonstrate effectiveness of our approach through quantitative analysis including automatic evaluation and human assessment. Project website: https://briannlongzhao.github.io/DreamDistribution

DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation

TL;DR

DreamDistribution learns a distribution of soft prompts to customize pretrained T2I diffusion models for abstract attribute-level personalization, avoiding diffusion-model fine-tuning. It models a prompt distribution with multiple learnable embeddings and uses a simple reparameterization trick and an orthogonal loss to optimize them in embedding space, enabling sampling of diverse prompts that produce varied yet coherent images. The method supports text-guided editing and controllability over diversity, and can be adapted to other tasks such as text-to-3D. Quantitative and human evaluations on diverse reference sets show improved quality and diversity compared with baselines.

Abstract

The popularization of Text-to-Image (T2I) diffusion models enables the generation of high-quality images from text descriptions. However, generating diverse customized images with reference visual attributes remains challenging. This work focuses on personalizing T2I diffusion models at a more abstract concept or category level, adapting commonalities from a set of reference images while creating new instances with sufficient variations. We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts, enabling the generation of novel images by sampling prompts from the learned distribution. These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions. We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D. Finally we demonstrate effectiveness of our approach through quantitative analysis including automatic evaluation and human assessment. Project website: https://briannlongzhao.github.io/DreamDistribution
Paper Structure (22 sections, 19 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 19 figures, 5 tables, 1 algorithm.

Figures (19)

  • Figure 1: In general, with more prompts, the performance increases in terms of both quality and diversity.
  • Figure 2: Choice of $\lambda$ value between $1\times10^{-4}$ and $1\times10^{-2}$ generally achieves good balance of quantitative metrics.
  • Figure 3: Similar to number of prompts, increasing number of prompt tokens also shows increasing image quality and diversity.
  • Figure 4: Samples of reference images from our evaluation set. Numbers on the right represent the number of images in each set.
  • Figure 5: Samples of generated image results using reference images from the evaluation set. Each row is generated using reference images of the corresponding row in \ref{['fig:eval_set']}.
  • ...and 14 more figures