Table of Contents
Fetching ...

Task Facet Learning: A Structured Approach to Prompt Optimization

Gurusha Juneja, Gautam Jajoo, Nagarajan Natarajan, Hua Li, Jian Jiao, Amit Sharma

TL;DR

This work reframes prompt optimization as learning multiple task facets and proposes UniPrompt, a facet-aware method that structures prompts into sections and updates them via clustered minibatch feedback from an expert LLM. By combining topic- and feedback-based clustering with a two-tier feedback mechanism and flexible editing (add/edit/delete) of prompt sections, UniPrompt consistently outperforms human-tuned prompts and several state-of-the-art methods across diverse tasks, including reasoning, multi-choice, and code generation. The results demonstrate that long, complex prompts capturing diverse facets can be generated automatically, enabling more accurate LLM performance with practical computational considerations. The approach offers a principled pathway to scalable, facet-rich prompt construction and suggests fruitful directions involving connections to submodular optimization and broader model evaluations.

Abstract

Given a task in the form of a basic description and its training examples, prompt optimization is the problem of synthesizing the given information into a text prompt for a large language model. Humans solve this problem by also considering the different facets that define a task (e.g., counter-examples, explanations, analogies) and including them in the prompt. However, it is unclear whether existing algorithmic approaches, based on iteratively editing a given prompt or automatically selecting a few in-context examples, can cover the multiple facets required to solve a complex task. In this work, we view prompt optimization as that of learning multiple facets of a task from a set of training examples. We exploit structure in the prompt optimization problem and break down a prompt into loosely coupled semantic sections. The proposed algorithm, UniPrompt, (1) clusters the input space and uses clustered batches so that each batch likely corresponds to a different facet of the task, and (2) utilizes a feedback mechanism to propose adding, editing or deleting a section, which in turn is aggregated over a batch to capture generalizable facets. Empirical evaluation on multiple datasets and a real-world task shows that prompts generated using \shortname{} obtain higher accuracy than human-tuned prompts and those from state-of-the-art methods. In particular, our algorithm can generate long, complex prompts that existing methods are unable to generate. Code for UniPrompt is available at https://aka.ms/uniprompt.

Task Facet Learning: A Structured Approach to Prompt Optimization

TL;DR

This work reframes prompt optimization as learning multiple task facets and proposes UniPrompt, a facet-aware method that structures prompts into sections and updates them via clustered minibatch feedback from an expert LLM. By combining topic- and feedback-based clustering with a two-tier feedback mechanism and flexible editing (add/edit/delete) of prompt sections, UniPrompt consistently outperforms human-tuned prompts and several state-of-the-art methods across diverse tasks, including reasoning, multi-choice, and code generation. The results demonstrate that long, complex prompts capturing diverse facets can be generated automatically, enabling more accurate LLM performance with practical computational considerations. The approach offers a principled pathway to scalable, facet-rich prompt construction and suggests fruitful directions involving connections to submodular optimization and broader model evaluations.

Abstract

Given a task in the form of a basic description and its training examples, prompt optimization is the problem of synthesizing the given information into a text prompt for a large language model. Humans solve this problem by also considering the different facets that define a task (e.g., counter-examples, explanations, analogies) and including them in the prompt. However, it is unclear whether existing algorithmic approaches, based on iteratively editing a given prompt or automatically selecting a few in-context examples, can cover the multiple facets required to solve a complex task. In this work, we view prompt optimization as that of learning multiple facets of a task from a set of training examples. We exploit structure in the prompt optimization problem and break down a prompt into loosely coupled semantic sections. The proposed algorithm, UniPrompt, (1) clusters the input space and uses clustered batches so that each batch likely corresponds to a different facet of the task, and (2) utilizes a feedback mechanism to propose adding, editing or deleting a section, which in turn is aggregated over a batch to capture generalizable facets. Empirical evaluation on multiple datasets and a real-world task shows that prompts generated using \shortname{} obtain higher accuracy than human-tuned prompts and those from state-of-the-art methods. In particular, our algorithm can generate long, complex prompts that existing methods are unable to generate. Code for UniPrompt is available at https://aka.ms/uniprompt.
Paper Structure (39 sections, 1 equation, 6 figures, 10 tables)

This paper contains 39 sections, 1 equation, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Existing prompt optimization methods (left) versus UniPrompt (right) on the Ethos dataset: [Left] State-of-the-art prompt optimization methods like ProTeGipryzant2023automatic sample from the questions wrongly answered by the current prompt, and use an expert LLM (e.g., GPT-4) to obtain feedback on the mistakes. This approach tends to give very general edits or overfits to specific examples. [Right] In contrast, UniPrompt identifies key task facets by: (1) clustering examples with similar task facets, and (2) employing a two-tier feedback-based update strategy. The resulting prompt updates extract generalizable concepts from the specific examples.
  • Figure 2: Estimating (probabilistic) Lipschitz constant of models (Definition \ref{['def:k_delta_lipschitz']}) on (left) Ethos (middle) GSM8K and (right) MedQA datasets for GPT-4 and GPT-3.5 models.
  • Figure 3: Evolution of prompts through iterations of UniPrompt on the Ethos dataset. Starting from a simple one-line prompt having an accuracy of $82\%$, UniPrompt adds background knowledge, corner cases, and additional sub-sections yielding a prompt with accuracy $88\%$. After further iterations, our algorithm converges to a detailed, human-like longform prompt that achieves accuracy of $92\%$.
  • Figure 4: Comparison of human-written Prompt and prompt produced by UniPrompt on MedQA dataset.
  • Figure 5: Comparison of prompt produced by the state-of-the-art ORPO LLMO and by UniPrompt on the GSM8K dataset.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition 1: Probabilistic Lipschitz Continuity nori