PACE: Improving Prompt with Actor-Critic Editing for Large Language Model

Yihong Dong; Kangcheng Luo; Xue Jiang; Zhi Jin; Ge Li

PACE: Improving Prompt with Actor-Critic Editing for Large Language Model

Yihong Dong, Kangcheng Luo, Xue Jiang, Zhi Jin, Ge Li

TL;DR

PACE addresses prompt quality variability in LLMs by adopting an RL-inspired actor–critic framework in which LLMs act as both the prompt creator (actor) and evaluator (critic). Prompts are treated as policies and iteratively refined through multi-agent feedback, optimizing the practical task score $s(p,X,Y)$ via $p^{\star} = \arg\max_p \mathbb{E}_{(X, Y)} s(p, X, Y)$. Experiments across 24 Instruction Induction tasks and 21 Big-Bench tasks show PACE yields large gains for medium/low-quality prompts, can match or surpass high-quality prompts, and outperforms prior methods like APE, with robustness across multiple LLMs. The results highlight the potential for automatic prompt generation and refinement to reduce human effort and improve applicability, while acknowledging computational costs and limitations on highly complex problems.

Abstract

Large language models (LLMs) have showcased remarkable potential across various tasks by conditioning on prompts. However, the quality of different human-written prompts leads to substantial discrepancies in LLMs' performance, and improving prompts usually necessitates considerable human effort and expertise. To this end, this paper proposes Prompt with Actor-Critic Editing (PACE) for LLMs to enable automatic prompt editing. Drawing inspiration from the actor-critic algorithm in reinforcement learning, PACE leverages LLMs as the dual roles of actors and critics, conceptualizing prompt as a type of policy. PACE refines prompt, taking into account the feedback from both actors performing prompt and critics criticizing response. This process helps LLMs better align prompt to a specific task, thanks to real responses and thinking from LLMs. We conduct extensive experiments on 24 instruction induction tasks and 21 big-bench tasks. Experimental results indicate that PACE elevates the relative performance of medium/low-quality human-written prompts by up to 98\%, which has comparable performance to high-quality human-written prompts. Moreover, PACE also exhibits notable efficacy for prompt generation.

PACE: Improving Prompt with Actor-Critic Editing for Large Language Model

TL;DR

via

. Experiments across 24 Instruction Induction tasks and 21 Big-Bench tasks show PACE yields large gains for medium/low-quality prompts, can match or surpass high-quality prompts, and outperforms prior methods like APE, with robustness across multiple LLMs. The results highlight the potential for automatic prompt generation and refinement to reduce human effort and improve applicability, while acknowledging computational costs and limitations on highly complex problems.

Abstract

Paper Structure (36 sections, 4 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 36 sections, 4 equations, 6 figures, 7 tables, 1 algorithm.

Introduction
PACE
Actor-Critic Paradigm
Iterative Algorithm
Experiment Setup
Benchmarks.
Implementation Details.
Experimental Results
The Effect of PACE in Prompt Editing
Ablation Study
Comparison with different LLMs
Effect of Iteration numbers
Related Work
Automatic prompt engineering with training.
Automatic prompt engineering without training.
...and 21 more sections

Figures (6)

Figure 1: The human-written prompt performance of ten tasks proposed in Instruction Induction dataset Instruction_Induction, where each task contains about eight human-written prompts, with absolute performance differences between 29% and 93% for each task (refer to Appendix \ref{['DII']} for detailed results).
Figure 2: The paradigm of PACE.
Figure 3: The Performance of PACE under Various Initial Prompts.
Figure 4: The ablation Study of PACE on Both Two Public Benchmark Datasets.
Figure 5: Performance of PACE with Different LLMs.
...and 1 more figures

PACE: Improving Prompt with Actor-Critic Editing for Large Language Model

TL;DR

Abstract

PACE: Improving Prompt with Actor-Critic Editing for Large Language Model

Authors

TL;DR

Abstract

Table of Contents

Figures (6)