Preference-Guided Prompt Optimization for Text-to-Image Generation
Zhipeng Li, Yi-Chi Liao, Christian Holz
TL;DR
This work tackles the difficulty of guiding text-to-image generation with prompts by proposing APPO, a preference-guided prompt optimization framework that relies on binary user feedback and an adaptive exploration strategy. APPO uses three prompt-generation strategies—retainment, alignment, and expansion—and a CLIP-based adaptive expansion policy to balance exploration and exploitation, achieving satisfactory results in fewer iterations with lower cognitive load. The authors validate APPO through synthetic tests and a user study, showing superior efficiency and comparable or better output quality versus baselines such as PromptCharm, DSPy, and Clarification, while reducing user effort. The findings demonstrate the potential of sparse, preference-based feedback to drive effective human-AI collaboration in generative tasks and point to broad applicability beyond image generation.
Abstract
Generative models are increasingly powerful, yet users struggle to guide them through prompts. The generative process is difficult to control and unpredictable, and user instructions may be ambiguous or under-specified. Prior prompt refinement tools heavily rely on human effort, while prompt optimization methods focus on numerical functions and are not designed for human-centered generative tasks, where feedback is better expressed as binary preferences and demands convergence within few iterations. We present APPO, a preference-guided prompt optimization algorithm. Instead of iterating prompts, users only provide binary preferential feedback. APPO adaptively balances its strategies between exploiting user feedback and exploring new directions, yielding effective and efficient optimization. We evaluate APPO on image generation, and the results show APPO enables achieving satisfactory outcomes in fewer iterations with lower cognitive load than manual prompt editing. We anticipate APPO will advance human-AI collaboration in generative tasks by leveraging user preferences to guide complex content creation.
