Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation

Michael Ogezi; Ning Shi

Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation

Michael Ogezi, Ning Shi

TL;DR

This work proposes NegOpt, a novel method for optimizing negative prompt generation toward enhanced image generation, using supervised fine-tuning and reinforcement learning, and constructs Negative Prompts DB, a publicly available dataset of negative prompts.

Abstract

In text-to-image generation, using negative prompts, which describe undesirable image characteristics, can significantly boost image quality. However, producing good negative prompts is manual and tedious. To address this, we propose NegOpt, a novel method for optimizing negative prompt generation toward enhanced image generation, using supervised fine-tuning and reinforcement learning. Our combined approach results in a substantial increase of 25% in Inception Score compared to other approaches and surpasses ground-truth negative prompts from the test set. Furthermore, with NegOpt we can preferentially optimize the metrics most important to us. Finally, we construct Negative Prompts DB (https://huggingface.co/datasets/mikeogezi/negopt_full), a publicly available dataset of negative prompts.

Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation

TL;DR

Abstract

Paper Structure (31 sections, 1 equation, 2 figures, 2 tables)

This paper contains 31 sections, 1 equation, 2 figures, 2 tables.

Introduction
Background
NegOpt
Dataset: Negative Prompts DB
Core Method: NegOpt
Phase 1: Supervised Fine-tuning (SFT)
Phase 2: Reinforcement Learning (RL)
Experimental Setup
SFT
SFT Dataset Subset
SFT Training
RL
RL Dataset Subset
RL Training
Evaluation
...and 16 more sections

Figures (2)

Figure 1: Images generated with NegOpt (SFT+RL specifically) negative prompts vs. baseline images.
Figure 2: In NegOpt, we first use a fine-tuned sequence-to-sequence language model to generate a negative prompt, $p'$, given a normal prompt, $p$. Next, we use $p$ and $p'$ to generate an image with an image generator. Finally, we further optimize our language model based on the reward received for the generated image.

Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation

TL;DR

Abstract

Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)