Prompt Performance Prediction for Image Generation
Nicolas Bizzozzero, Ihab Bendidi, Olivier Risser-Maroix
TL;DR
Prompt Performance Prediction (PPP) tackles predicting a prompt's effectiveness for image generation before results are produced, formalized as predicting $R(T_i)=f_{\theta}(T_i)+\epsilon$ with $\epsilon \sim \mathcal{N}(0,\sigma^2)$. The study evaluates PPP across three prompt-image datasets and three art-domain datasets, using ground-truth relevance proxies from six pre-trained predictors and CLIP-based representations, with $R(T_i)=\frac{1}{J}\sum_{j=1}^J R(I_{i,j})$ and parameter learning via likelihood $\theta^* = \arg\max_{\theta} \sum_i \log p(R_i|T_i;\theta)$. CLIP-based textual embeddings deliver the strongest PPP signals, outperforming other language models, though a modality gap between image and text representations can limit cross-modal predictions. The results demonstrate PPP's potential to guide prompt design and reformulation in generative IR, enabling proactive optimization and resource-efficient content creation, with implications for user experience and model feedback.
Abstract
The ability to predict the performance of a query before results are returned has been a longstanding challenge in Information Retrieval (IR) systems. Inspired by this task, we introduce, in this paper, a novel task called "Prompt Performance Prediction" (PPP) that aims to predict the performance of a prompt, before obtaining the actual generated images. We demonstrate the plausibility of our task by measuring the correlation coefficient between predicted and actual performance scores across: three datasets containing pairs of prompts and generated images AND three art domain datasets of real images and real user appreciation ratings. Our results show promising performance prediction capabilities, suggesting potential applications for optimizing user prompts.
