Minority-Focused Text-to-Image Generation via Prompt Optimization

Soobin Um; Jong Chul Ye

Minority-Focused Text-to-Image Generation via Prompt Optimization

Soobin Um, Jong Chul Ye

TL;DR

This work tackles the challenge of generating minority samples in text-to-image diffusion by addressing the high-density bias of common samplers. It introduces MinorityPrompt, a token-based online prompt optimization method that appends a learnable token to user prompts and updates it during inference to encourage low-density features while preserving semantics, with a theoretical link to log-likelihood via a carefully crafted objective ${\cal J}_{\cal C}$. The authors demonstrate state-of-the-art performance in minority generation across multiple backbones (including SDv1.5, SDv2.0, and SDXL-LT), show robustness to various solvers, and provide extensive ablations and human studies. Beyond minority generation, they illustrate the framework’s versatility for promoting diversity and potential applicability to other inference-time optimization tasks, releasing code to facilitate adoption.

Abstract

We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models. Minority instances, in the context of T2I generation, can be defined as ones living on low-density regions of text-conditional data distributions. They are valuable for various applications of modern T2I generators, such as data augmentation and creative AI. Unfortunately, existing pretrained T2I diffusion models primarily focus on high-density regions, largely due to the influence of guided samplers (like CFG) that are essential for high-quality generation. To address this, we present a novel framework to counter the high-density-focus of T2I diffusion models. Specifically, we first develop an online prompt optimization framework that encourages emergence of desired properties during inference while preserving semantic contents of user-provided prompts. We subsequently tailor this generic prompt optimizer into a specialized solver that promotes generation of minority features by incorporating a carefully-crafted likelihood objective. Extensive experiments conducted across various types of T2I models demonstrate that our approach significantly enhances the capability to produce high-quality minority instances compared to existing samplers. Code is available at https://github.com/soobin-um/MinorityPrompt.

Minority-Focused Text-to-Image Generation via Prompt Optimization

TL;DR

Abstract

Minority-Focused Text-to-Image Generation via Prompt Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (2)