Table of Contents
Fetching ...

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Shachar Rosenman, Vasudev Lal, Phillip Howard

TL;DR

NeuroPrompts presents an adaptive, two-stage framework that automatically enhances user prompts for text-to-image diffusion models by first adapting a language model to human prompt-engineering style through supervised fine-tuning and PPO-based reinforcement learning, then applying NeuroLogic-constrained decoding to satisfy stylistic keyword clauses. The approach uses prefixes extracted from human prompts, generates optimized continuations, and evaluates images with PickScore and an aesthetics predictor, achieving higher image quality than both unoptimized prompts and even human-authored prompts. Experimental results on DiffusionDB with Stable Diffusion demonstrate meaningful gains in aesthetics (mean ~6.27) and predicted user preference (≈60% PickScore), with PPO contributing substantial improvements and NeuroLogic enabling user-controllable constraints. The work provides an accessible prompt-assistance tool for artists and designers, while acknowledging limitations such as evaluation confined to Stable Diffusion and potential societal biases in generated content.

Abstract

Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models. Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers. This approach enables higher-quality text-to-image generations and provides user control over stylistic features via constraint set specification. We demonstrate the utility of our framework by creating an interactive application for prompt enhancement and image generation using Stable Diffusion. Additionally, we conduct experiments utilizing a large dataset of human-engineered prompts for text-to-image generation and show that our approach automatically produces enhanced prompts that result in superior image quality. We make our code and a screencast video demo of NeuroPrompts publicly available.

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

TL;DR

NeuroPrompts presents an adaptive, two-stage framework that automatically enhances user prompts for text-to-image diffusion models by first adapting a language model to human prompt-engineering style through supervised fine-tuning and PPO-based reinforcement learning, then applying NeuroLogic-constrained decoding to satisfy stylistic keyword clauses. The approach uses prefixes extracted from human prompts, generates optimized continuations, and evaluates images with PickScore and an aesthetics predictor, achieving higher image quality than both unoptimized prompts and even human-authored prompts. Experimental results on DiffusionDB with Stable Diffusion demonstrate meaningful gains in aesthetics (mean ~6.27) and predicted user preference (≈60% PickScore), with PPO contributing substantial improvements and NeuroLogic enabling user-controllable constraints. The work provides an accessible prompt-assistance tool for artists and designers, while acknowledging limitations such as evaluation confined to Stable Diffusion and potential societal biases in generated content.

Abstract

Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models. Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers. This approach enables higher-quality text-to-image generations and provides user control over stylistic features via constraint set specification. We demonstrate the utility of our framework by creating an interactive application for prompt enhancement and image generation using Stable Diffusion. Additionally, we conduct experiments utilizing a large dataset of human-engineered prompts for text-to-image generation and show that our approach automatically produces enhanced prompts that result in superior image quality. We make our code and a screencast video demo of NeuroPrompts publicly available.
Paper Structure (28 sections, 3 equations, 1 figure, 3 tables)