Fast Prompt Alignment for Text-to-Image Generation

Khalil Mrini; Hanlin Lu; Linjie Yang; Weilin Huang; Heng Wang

Fast Prompt Alignment for Text-to-Image Generation

Khalil Mrini, Hanlin Lu, Linjie Yang, Weilin Huang, Heng Wang

TL;DR

Fast Prompt Alignment (FPA) tackles the slow, iterative nature of text-to-image prompt optimization by converting iterative gains into a single-pass workflow. It uses a large LLM to paraphrase prompts, then either fine-tunes a 7B model for real-time inference or applies in-context learning with a 123B model to produce optimized prompts on the fly. Across COCO Captions and PartiPrompts, FPA achieves competitive TIFA and VQA alignment with substantial speedups, a finding reinforced by a human study that shows strong correlation between human judgments and automated metrics. The results suggest FPA as a scalable solution for real-time, high-demand T2I applications, with code released to enable broader adoption and further research.

Abstract

Text-to-image generation has advanced rapidly, yet aligning complex textual prompts with generated visuals remains challenging, especially with intricate object relationships and fine-grained details. This paper introduces Fast Prompt Alignment (FPA), a prompt optimization framework that leverages a one-pass approach, enhancing text-to-image alignment efficiency without the iterative overhead typical of current methods like OPT2I. FPA uses large language models (LLMs) for single-iteration prompt paraphrasing, followed by fine-tuning or in-context learning with optimized prompts to enable real-time inference, reducing computational demands while preserving alignment fidelity. Extensive evaluations on the COCO Captions and PartiPrompts datasets demonstrate that FPA achieves competitive text-image alignment scores at a fraction of the processing time, as validated through both automated metrics (TIFA, VQA) and human evaluation. A human study with expert annotators further reveals a strong correlation between human alignment judgments and automated scores, underscoring the robustness of FPA's improvements. The proposed method showcases a scalable, efficient alternative to iterative prompt optimization, enabling broader applicability in real-time, high-demand settings. The codebase is provided to facilitate further research: https://github.com/tiktok/fast_prompt_alignment

Fast Prompt Alignment for Text-to-Image Generation

TL;DR

Abstract

Fast Prompt Alignment for Text-to-Image Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)