Efficient and Accurate Prompt Optimization: the Benefit of Memory in Exemplar-Guided Reflection
Cilin Yan, Jingyun Wang, Lin Zhang, Ruihui Zhao, Xiaopu Wu, Kai Xiong, Qingsong Liu, Guoliang Kang, Yangyang Kang
TL;DR
Automatic prompt optimization often relies on failure-focused feedback and exemplar retrieval, but prior approaches neglect historical feedback and optimal exemplar selection. The authors propose Exemplar-Guided Reflection with Memory (ERM), a framework that uses an instructive meta-prompt to generate diverse exemplars with detailed solutions, stores feedbacks in a Feedback Memory, and pre-assesses exemplars in an Exemplar Factory. ERM retrieves high-quality exemplars during inference and leverages memory-driven feedback to accelerate prompt refinement, achieving notable gains (e.g., LIAR F1 +10.1; WebNLG Rouge-L +3.9) and roughly halving optimization steps compared to ProTeGi. Overall, ERM demonstrates strong cross-task improvements and offers a scalable path to more efficient and accurate prompt optimization.
Abstract
Automatic prompt engineering aims to enhance the generation quality of large language models (LLMs). Recent works utilize feedbacks generated from erroneous cases to guide the prompt optimization. During inference, they may further retrieve several semantically-related exemplars and concatenate them to the optimized prompts to improve the performance. However, those works only utilize the feedback at the current step, ignoring historical and unseleccted feedbacks which are potentially beneficial. Moreover, the selection of exemplars only considers the general semantic relationship and may not be optimal in terms of task performance and matching with the optimized prompt. In this work, we propose an Exemplar-Guided Reflection with Memory mechanism (ERM) to realize more efficient and accurate prompt optimization. Specifically, we design an exemplar-guided reflection mechanism where the feedback generation is additionally guided by the generated exemplars. We further build two kinds of memory to fully utilize the historical feedback information and support more effective exemplar retrieval. Empirical evaluations show our method surpasses previous state-of-the-arts with less optimization steps, i.e., improving F1 score by 10.1 on LIAR dataset, and reducing half of the optimization steps on ProTeGi.
