Table of Contents
Fetching ...

Localized Zeroth-Order Prompt Optimization

Wenyang Hu, Yao Shu, Zongmin Yu, Zhaoxuan Wu, Xiangqiang Lin, Zhongxiang Dai, See-Kiong Ng, Bryan Kian Hsiang Low

TL;DR

A novel algorithm, namely localized zeroth-order prompt optimization (ZOPO), which incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization is proposed.

Abstract

The efficacy of large language models (LLMs) in understanding and generating natural language has aroused a wide interest in developing prompt-based methods to harness the power of black-box LLMs. Existing methodologies usually prioritize a global optimization for finding the global optimum, which however will perform poorly in certain tasks. This thus motivates us to re-think the necessity of finding a global optimum in prompt optimization. To answer this, we conduct a thorough empirical study on prompt optimization and draw two major insights. Contrasting with the rarity of global optimum, local optima are usually prevalent and well-performed, which can be more worthwhile for efficient prompt optimization (Insight I). The choice of the input domain, covering both the generation and the representation of prompts, affects the identification of well-performing local optima (Insight II). Inspired by these insights, we propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO), which incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization. Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency, which we demonstrate through extensive experiments.

Localized Zeroth-Order Prompt Optimization

TL;DR

A novel algorithm, namely localized zeroth-order prompt optimization (ZOPO), which incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization is proposed.

Abstract

The efficacy of large language models (LLMs) in understanding and generating natural language has aroused a wide interest in developing prompt-based methods to harness the power of black-box LLMs. Existing methodologies usually prioritize a global optimization for finding the global optimum, which however will perform poorly in certain tasks. This thus motivates us to re-think the necessity of finding a global optimum in prompt optimization. To answer this, we conduct a thorough empirical study on prompt optimization and draw two major insights. Contrasting with the rarity of global optimum, local optima are usually prevalent and well-performed, which can be more worthwhile for efficient prompt optimization (Insight I). The choice of the input domain, covering both the generation and the representation of prompts, affects the identification of well-performing local optima (Insight II). Inspired by these insights, we propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO), which incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization. Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency, which we demonstrate through extensive experiments.
Paper Structure (42 sections, 3 theorems, 14 equations, 11 figures, 10 tables, 1 algorithm)

This paper contains 42 sections, 3 theorems, 14 equations, 11 figures, 10 tables, 1 algorithm.

Key Result

Proposition 1

Assume $k(z,z') \leq \alpha$ and $\left\| k"(z, z)\right\| \leq \kappa$ for any $z,z' \in {\mathcal{Z}}$. Let $\delta \in (0,1)$ and $N_{z,\beta} \triangleq \{z' \in \{z_{\tau}\}_{\tau=1}^t \mid \left\|\partial_z k(z', z)\right\|^2 \geq \beta \}$ for given input $z \in {\mathcal{Z}}$, the following where $\omega = d + 2(\sqrt{d}+1)\ln(1/\delta)$ and $\Sigma^2_t(z) \triangleq \Sigma^2_t(z, z)$.

Figures (11)

  • Figure 1: The performance profile for different methods on instruction induction tasks, where $\tau$ indicates the distance from optimality, and $\rho(\tau)$ is the frequency for the method within $\tau$ distance to optimality.
  • Figure 2: The validation accuracy of 300 randomly sampled prompts with the last token representation on various tasks.
  • Figure 3: The estimated accuracy distribution of prompts generated by Vicuna-13B or ChatGPT on various instruction induction tasks, where the vertical dotted line is the mean performance.
  • Figure 4: The function surfaces on various tasks using the last token embedding from Vicuna-13B or the SBERT embedding as the representation for prompt candidates that are generated by Vicuna-13B.
  • Figure 5: Comparison of the query efficiency between our ZOPO and other existing baselines on instruction induction tasks. The first row shows the test accuracy and the second row shows the validation accuracy across different tasks.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Lemma A.1: Thm. 1 in zord
  • Lemma A.2: Lemma B.4 in zord