Table of Contents
Fetching ...

Preference-Guided Prompt Optimization for Text-to-Image Generation

Zhipeng Li, Yi-Chi Liao, Christian Holz

TL;DR

This work tackles the difficulty of guiding text-to-image generation with prompts by proposing APPO, a preference-guided prompt optimization framework that relies on binary user feedback and an adaptive exploration strategy. APPO uses three prompt-generation strategies—retainment, alignment, and expansion—and a CLIP-based adaptive expansion policy to balance exploration and exploitation, achieving satisfactory results in fewer iterations with lower cognitive load. The authors validate APPO through synthetic tests and a user study, showing superior efficiency and comparable or better output quality versus baselines such as PromptCharm, DSPy, and Clarification, while reducing user effort. The findings demonstrate the potential of sparse, preference-based feedback to drive effective human-AI collaboration in generative tasks and point to broad applicability beyond image generation.

Abstract

Generative models are increasingly powerful, yet users struggle to guide them through prompts. The generative process is difficult to control and unpredictable, and user instructions may be ambiguous or under-specified. Prior prompt refinement tools heavily rely on human effort, while prompt optimization methods focus on numerical functions and are not designed for human-centered generative tasks, where feedback is better expressed as binary preferences and demands convergence within few iterations. We present APPO, a preference-guided prompt optimization algorithm. Instead of iterating prompts, users only provide binary preferential feedback. APPO adaptively balances its strategies between exploiting user feedback and exploring new directions, yielding effective and efficient optimization. We evaluate APPO on image generation, and the results show APPO enables achieving satisfactory outcomes in fewer iterations with lower cognitive load than manual prompt editing. We anticipate APPO will advance human-AI collaboration in generative tasks by leveraging user preferences to guide complex content creation.

Preference-Guided Prompt Optimization for Text-to-Image Generation

TL;DR

This work tackles the difficulty of guiding text-to-image generation with prompts by proposing APPO, a preference-guided prompt optimization framework that relies on binary user feedback and an adaptive exploration strategy. APPO uses three prompt-generation strategies—retainment, alignment, and expansion—and a CLIP-based adaptive expansion policy to balance exploration and exploitation, achieving satisfactory results in fewer iterations with lower cognitive load. The authors validate APPO through synthetic tests and a user study, showing superior efficiency and comparable or better output quality versus baselines such as PromptCharm, DSPy, and Clarification, while reducing user effort. The findings demonstrate the potential of sparse, preference-based feedback to drive effective human-AI collaboration in generative tasks and point to broad applicability beyond image generation.

Abstract

Generative models are increasingly powerful, yet users struggle to guide them through prompts. The generative process is difficult to control and unpredictable, and user instructions may be ambiguous or under-specified. Prior prompt refinement tools heavily rely on human effort, while prompt optimization methods focus on numerical functions and are not designed for human-centered generative tasks, where feedback is better expressed as binary preferences and demands convergence within few iterations. We present APPO, a preference-guided prompt optimization algorithm. Instead of iterating prompts, users only provide binary preferential feedback. APPO adaptively balances its strategies between exploiting user feedback and exploring new directions, yielding effective and efficient optimization. We evaluate APPO on image generation, and the results show APPO enables achieving satisfactory outcomes in fewer iterations with lower cognitive load than manual prompt editing. We anticipate APPO will advance human-AI collaboration in generative tasks by leveraging user preferences to guide complex content creation.
Paper Structure (74 sections, 12 figures, 3 tables, 3 algorithms)

This paper contains 74 sections, 12 figures, 3 tables, 3 algorithms.

Figures (12)

  • Figure 1: Preference-driven generation workflow enabled by APPO, with image generation as an example. The user begins with an initial prompt specifying the objects to be included in the generated images (top left). In the first iteration, the optimizer expands this prompt to explore diverse possibilities and generate multiple prompt variants (right). The generative model then produces outputs corresponding to these variants (down), which are presented to the user in a gallery (left). The user selects their preferred results (up), which are then fed into the optimizer who infers their preferences to generate refined prompts. These refined prompts are used by the generative model to produce new outputs, which are further evaluated by the user in subsequent iterations.
  • Figure 2: The concept of optimization methods behind APPO with a concrete example. In each iteration, APPO takes the preferred prompts (green) and non-preferred prompts (red) from the previous iteration as input. Three strategies are applied: It first retains the preferred prompts. In parallel, the expansion strategy applies evolutionary operations to explore new prompts. This begins with crossover, mixing elements of the preferred prompts (words in different green colors), followed by mutation to generate additional prompts (orange). Simultaneously, the alignment strategy estimates the textual gradient from both preferred and non-preferred prompts to identify elements generally favored by the user (blue). This gradient is then applied to non-preferred prompts to better align them with user preferences. Finally, all of the three retainment, expansion, and alignment prompts form the set of prompts for the next iteration.
  • Figure 3: User study interface and intermediate results from APPO. Participants are presented with nine candidate images and select one or more that best align with their generation goal. They can indicate whether they are satisfied with the current iteration’s outcome or wish to see additional candidates. In the results shown here, iteration 3 (middle) converges toward the “old-film” theme selected by the user in previous iterations (left), demonstrating that APPO effectively guides generative models toward user preferences. By iteration 5 (right), APPO detects that the optimization has converged to a local optimum and responds by exploring unknown directions to continue improving the results.
  • Figure 4: Number of iterations and total time (in seconds) spent required to achieve satisfactory for both close- and open-ended tasks across three conditions (means and standard deviations). (*: $p < 0.05$, **: $p < 0.01$, ***: $p < 0.001$).
  • Figure 5: NASA-TLX questionnaire results for both close- and open-ended tasks across all four conditions (means and standard deviations). (*: $p < 0.05$, **: $p < 0.01$, ***: $p < 0.001$).
  • ...and 7 more figures