Table of Contents
Fetching ...

InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models

Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, Tianyi Zhou

TL;DR

This work tackles the challenge of optimizing instructions for black-box LLMs by converting the discrete, structured instruction search into a low-dimensional continuous problem. It introduces InstructZero, which learns a soft prompt for an open-source LLM to generate task-specific instructions for a black-box API LLM, with Bayesian optimization guided by an instruction-Coupled Kernel that aligns latent soft-prompt space with instruction similarity. Across 32 tasks, including ChatGPT-based evaluations, InstructZero achieves state-of-the-art zero-shot performance and often surpasses much larger baselines, while demonstrating efficient exploration via a CMA-ES-based search in a latent space. The approach is open-source-friendly, scalable with small open-models, and highlights practical gains for automated prompt engineering in real-world API settings.

Abstract

Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at https://github.com/Lichang-Chen/InstructZero.

InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models

TL;DR

This work tackles the challenge of optimizing instructions for black-box LLMs by converting the discrete, structured instruction search into a low-dimensional continuous problem. It introduces InstructZero, which learns a soft prompt for an open-source LLM to generate task-specific instructions for a black-box API LLM, with Bayesian optimization guided by an instruction-Coupled Kernel that aligns latent soft-prompt space with instruction similarity. Across 32 tasks, including ChatGPT-based evaluations, InstructZero achieves state-of-the-art zero-shot performance and often surpasses much larger baselines, while demonstrating efficient exploration via a CMA-ES-based search in a latent space. The approach is open-source-friendly, scalable with small open-models, and highlights practical gains for automated prompt engineering in real-world API settings.

Abstract

Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at https://github.com/Lichang-Chen/InstructZero.
Paper Structure (23 sections, 10 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 23 sections, 10 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison between InstructZero and two baselines, i.e., APE ape and uniform sampling (defined in baselines of Section \ref{['sec:task_data_basel']}). Left:InstructZero generate a more precise instruction leading to better performance (higher execution accuracy). Right: Histogram of InstructZero's improvement over APE and Uniform on 32 tasks. InstructZero achieves a significant improvement between $[20\%,100\%)$ in terms of accuracy on a majority of evaluated tasks.
  • Figure 2: Pipeline of InstructZero. On each iteration, a soft prompt and a few exemplars of the target task are sent to the open-source LLM for generating an instruction, which then prompts the black-box LLM to produce answers to target-task queries. The score (e.g., accuracy) of the answers and the soft prompt is added as new training data for BO, which updates its posterior about the objective and produces a new soft prompt to explore in the next iteration. Both LLMs are frozen.
  • Figure 3: The pipeline of Bayesian optimization in InstructZero proposed in Section \ref{['sec:BO']}.
  • Figure 4: Zero-shot test accuracy on 32 tasks from honovich2022instruction. InstructZero achieves the best performance on all 32 out of 32 tasks among the three evaluated approaches.
  • Figure 5:
  • ...and 4 more figures