Table of Contents
Fetching ...

Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation

Michael Ogezi, Ning Shi

TL;DR

This work proposes NegOpt, a novel method for optimizing negative prompt generation toward enhanced image generation, using supervised fine-tuning and reinforcement learning, and constructs Negative Prompts DB, a publicly available dataset of negative prompts.

Abstract

In text-to-image generation, using negative prompts, which describe undesirable image characteristics, can significantly boost image quality. However, producing good negative prompts is manual and tedious. To address this, we propose NegOpt, a novel method for optimizing negative prompt generation toward enhanced image generation, using supervised fine-tuning and reinforcement learning. Our combined approach results in a substantial increase of 25% in Inception Score compared to other approaches and surpasses ground-truth negative prompts from the test set. Furthermore, with NegOpt we can preferentially optimize the metrics most important to us. Finally, we construct Negative Prompts DB (https://huggingface.co/datasets/mikeogezi/negopt_full), a publicly available dataset of negative prompts.

Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation

TL;DR

This work proposes NegOpt, a novel method for optimizing negative prompt generation toward enhanced image generation, using supervised fine-tuning and reinforcement learning, and constructs Negative Prompts DB, a publicly available dataset of negative prompts.

Abstract

In text-to-image generation, using negative prompts, which describe undesirable image characteristics, can significantly boost image quality. However, producing good negative prompts is manual and tedious. To address this, we propose NegOpt, a novel method for optimizing negative prompt generation toward enhanced image generation, using supervised fine-tuning and reinforcement learning. Our combined approach results in a substantial increase of 25% in Inception Score compared to other approaches and surpasses ground-truth negative prompts from the test set. Furthermore, with NegOpt we can preferentially optimize the metrics most important to us. Finally, we construct Negative Prompts DB (https://huggingface.co/datasets/mikeogezi/negopt_full), a publicly available dataset of negative prompts.
Paper Structure (31 sections, 1 equation, 2 figures, 2 tables)

This paper contains 31 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Images generated with NegOpt (SFT+RL specifically) negative prompts vs. baseline images.
  • Figure 2: In NegOpt, we first use a fine-tuned sequence-to-sequence language model to generate a negative prompt, $p'$, given a normal prompt, $p$. Next, we use $p$ and $p'$ to generate an image with an image generator. Finally, we further optimize our language model based on the reward received for the generated image.