Table of Contents
Fetching ...

Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers

Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao, Siyuan Lu, Yaliang Li, Ji-Rong Wen

TL;DR

This work treats automatic prompt optimization for LLMs as a gradient-inspired problem by drawing an analogy to gradient-based model optimizers. It formalizes two core factors—update direction and update method—and systematically analyzes their design choices, including descent-direction vs momentum and learning-rate-like edit-distance constraints. Building on these insights, the authors propose GPO, a Gradient-inspired LLM-based Prompt Optimizer, which uses retrieval-based trajectory as update direction and generation-based refinement with a cosine-distance constraint as the update method. Extensive experiments across BBH, MMLU, GSM8K, WSC, and WebNLG demonstrate that GPO achieves substantial improvements over strong baselines and existing LLM-based optimizers, while maintaining efficiency. The work also discusses limitations and future directions, such as exploring more advanced optimization methods and numeric signals for meta-prompt updates.

Abstract

Automatic prompt optimization is an important approach to improving the performance of large language models (LLMs). Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via iterative refinement. In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradient-based model optimizers. To connect these two approaches, we identify two pivotal factors in model parameter learning: update direction and update method. By systematically analyzing a rich set of improvement strategies on the two aspects, we further develop a capable Gradient-inspired LLM-based Prompt Optimizer called GPO. At each step, it first retrieves relevant prompts from the optimization trajectory as the update direction. Then, it utilizes the generation-based refinement strategy to perform the update, while controlling the edit distance through a cosine-based decay strategy. Extensive experiments demonstrate the effectiveness and efficiency of GPO. In particular, GPO brings an additional improvement of up to 56.8% on Big-Bench Hard and 62.6% on MMLU compared to baseline methods. The code is available at https://github.com/RUCAIBox/GPO.

Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers

TL;DR

This work treats automatic prompt optimization for LLMs as a gradient-inspired problem by drawing an analogy to gradient-based model optimizers. It formalizes two core factors—update direction and update method—and systematically analyzes their design choices, including descent-direction vs momentum and learning-rate-like edit-distance constraints. Building on these insights, the authors propose GPO, a Gradient-inspired LLM-based Prompt Optimizer, which uses retrieval-based trajectory as update direction and generation-based refinement with a cosine-distance constraint as the update method. Extensive experiments across BBH, MMLU, GSM8K, WSC, and WebNLG demonstrate that GPO achieves substantial improvements over strong baselines and existing LLM-based optimizers, while maintaining efficiency. The work also discusses limitations and future directions, such as exploring more advanced optimization methods and numeric signals for meta-prompt updates.

Abstract

Automatic prompt optimization is an important approach to improving the performance of large language models (LLMs). Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via iterative refinement. In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradient-based model optimizers. To connect these two approaches, we identify two pivotal factors in model parameter learning: update direction and update method. By systematically analyzing a rich set of improvement strategies on the two aspects, we further develop a capable Gradient-inspired LLM-based Prompt Optimizer called GPO. At each step, it first retrieves relevant prompts from the optimization trajectory as the update direction. Then, it utilizes the generation-based refinement strategy to perform the update, while controlling the edit distance through a cosine-based decay strategy. Extensive experiments demonstrate the effectiveness and efficiency of GPO. In particular, GPO brings an additional improvement of up to 56.8% on Big-Bench Hard and 62.6% on MMLU compared to baseline methods. The code is available at https://github.com/RUCAIBox/GPO.
Paper Structure (50 sections, 2 equations, 6 figures, 17 tables)

This paper contains 50 sections, 2 equations, 6 figures, 17 tables.

Figures (6)

  • Figure 1: Comparisons of GPO to existing LLM-based prompt optimizers in terms of effectiveness (Accuracy) and efficiency (improvement per dollar spent on API) on BBH.
  • Figure 2: The overview of the GPO framework. "Current information", "Trajectory", "Edit distance", and "Refinement strategy" are concepts of LLM-based prompt optimizers, which can correspond to "Descent direction", "Momentum", "Learning rate", and "Descent" in gradient-based model optimizers.
  • Figure 3: The efficiency of our approach GPO w.r.t. optimization steps and token consumption.
  • Figure 4: Performance comparison w.r.t. the temperature of the LLM in GPO and the length of the trajectory.
  • Figure 5: Execution time of LLM-based prompt optimizers
  • ...and 1 more figures