Table of Contents
Fetching ...

Efficient and Accurate Prompt Optimization: the Benefit of Memory in Exemplar-Guided Reflection

Cilin Yan, Jingyun Wang, Lin Zhang, Ruihui Zhao, Xiaopu Wu, Kai Xiong, Qingsong Liu, Guoliang Kang, Yangyang Kang

TL;DR

Automatic prompt optimization often relies on failure-focused feedback and exemplar retrieval, but prior approaches neglect historical feedback and optimal exemplar selection. The authors propose Exemplar-Guided Reflection with Memory (ERM), a framework that uses an instructive meta-prompt to generate diverse exemplars with detailed solutions, stores feedbacks in a Feedback Memory, and pre-assesses exemplars in an Exemplar Factory. ERM retrieves high-quality exemplars during inference and leverages memory-driven feedback to accelerate prompt refinement, achieving notable gains (e.g., LIAR F1 +10.1; WebNLG Rouge-L +3.9) and roughly halving optimization steps compared to ProTeGi. Overall, ERM demonstrates strong cross-task improvements and offers a scalable path to more efficient and accurate prompt optimization.

Abstract

Automatic prompt engineering aims to enhance the generation quality of large language models (LLMs). Recent works utilize feedbacks generated from erroneous cases to guide the prompt optimization. During inference, they may further retrieve several semantically-related exemplars and concatenate them to the optimized prompts to improve the performance. However, those works only utilize the feedback at the current step, ignoring historical and unseleccted feedbacks which are potentially beneficial. Moreover, the selection of exemplars only considers the general semantic relationship and may not be optimal in terms of task performance and matching with the optimized prompt. In this work, we propose an Exemplar-Guided Reflection with Memory mechanism (ERM) to realize more efficient and accurate prompt optimization. Specifically, we design an exemplar-guided reflection mechanism where the feedback generation is additionally guided by the generated exemplars. We further build two kinds of memory to fully utilize the historical feedback information and support more effective exemplar retrieval. Empirical evaluations show our method surpasses previous state-of-the-arts with less optimization steps, i.e., improving F1 score by 10.1 on LIAR dataset, and reducing half of the optimization steps on ProTeGi.

Efficient and Accurate Prompt Optimization: the Benefit of Memory in Exemplar-Guided Reflection

TL;DR

Automatic prompt optimization often relies on failure-focused feedback and exemplar retrieval, but prior approaches neglect historical feedback and optimal exemplar selection. The authors propose Exemplar-Guided Reflection with Memory (ERM), a framework that uses an instructive meta-prompt to generate diverse exemplars with detailed solutions, stores feedbacks in a Feedback Memory, and pre-assesses exemplars in an Exemplar Factory. ERM retrieves high-quality exemplars during inference and leverages memory-driven feedback to accelerate prompt refinement, achieving notable gains (e.g., LIAR F1 +10.1; WebNLG Rouge-L +3.9) and roughly halving optimization steps compared to ProTeGi. Overall, ERM demonstrates strong cross-task improvements and offers a scalable path to more efficient and accurate prompt optimization.

Abstract

Automatic prompt engineering aims to enhance the generation quality of large language models (LLMs). Recent works utilize feedbacks generated from erroneous cases to guide the prompt optimization. During inference, they may further retrieve several semantically-related exemplars and concatenate them to the optimized prompts to improve the performance. However, those works only utilize the feedback at the current step, ignoring historical and unseleccted feedbacks which are potentially beneficial. Moreover, the selection of exemplars only considers the general semantic relationship and may not be optimal in terms of task performance and matching with the optimized prompt. In this work, we propose an Exemplar-Guided Reflection with Memory mechanism (ERM) to realize more efficient and accurate prompt optimization. Specifically, we design an exemplar-guided reflection mechanism where the feedback generation is additionally guided by the generated exemplars. We further build two kinds of memory to fully utilize the historical feedback information and support more effective exemplar retrieval. Empirical evaluations show our method surpasses previous state-of-the-arts with less optimization steps, i.e., improving F1 score by 10.1 on LIAR dataset, and reducing half of the optimization steps on ProTeGi.

Paper Structure

This paper contains 22 sections, 9 equations, 27 figures, 14 tables.

Figures (27)

  • Figure 1: Feedback-based automatic prompt engineering methods commonly employ a meta-prompt, which guides LLMs to evaluate the current case, provide feedbacks, and generate refined prompts. In this work, we design an instructive meta-prompt to select exemplars with detailed solution processes, and generate feedbacks for the current case. These feedbacks are stored in Feedback Memory and periodically retrieved to efficiently guide the optimization of prompts. Additionally, these exemplars are stored and assessed in an Exemplar Factory to enhance prediction accuracy.
  • Figure 2: Pipeline of ERM. In wrong prediction samples, the instructive reflective meta-prompt is employed to select exemplars with detailed answer processes, which are subsequently followed by feedback generation. The feedbacks are stored in feedback memory storage, and the exemplars are stored in exemplar memory storage. These stored feedbacks are periodically retrieved to efficiently guide prompt optimization, with selective forgetting based on their effectiveness in enhancing optimization. Additionally, these exemplars are assessed to enhance prediction accuracy.
  • Figure 3: The efficiency of our approach ERM. The size of the circles represents performance, with larger circles indicating better performance. The vertical axis shows the optimization steps needed for different methods to achieve peak performance across datasets.
  • Figure 4: Intructive reflection meta-prompt.
  • Figure 5: Optimization meta-prompt.
  • ...and 22 more figures