Table of Contents
Fetching ...

Robust Prompt Optimization for Large Language Models Against Distribution Shifts

Moxin Li, Wenjie Wang, Fuli Feng, Yixin Cao, Jizhi Zhang, Tat-Seng Chua

TL;DR

This work identifies a robustness gap in gradient-free prompt optimization for LLMs under distribution shifts between source and target groups. It introduces the Generalized Prompt Optimization (GPO) framework, which leverages a meta-prompt–driven generation of multiple prompts, label ensemble strategies on unlabeled target inputs, and joint optimization to produce a single task-specific prompt that generalizes across distributions. Through extensive experiments across 16 datasets spanning six NLP tasks, GPO improves target-group performance while maintaining source-group performance relative to strong baselines like APE and APO, and shows effectiveness across multiple backbone LLMs, including GPT-4. The study highlights the importance of accurate target-labeling and consistency thresholds, and points to practical avenues for deploying robust prompts in real-world, shifting environments.

Abstract

Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks. However, their effectiveness is highly dependent on the phrasing of the task prompt, leading to research on automatic prompt optimization using labeled task data. We reveal that these prompt optimization techniques are vulnerable to distribution shifts such as subpopulation shifts, which are common for LLMs in real-world scenarios such as customer reviews analysis. In this light, we propose a new problem of robust prompt optimization for LLMs against distribution shifts, which requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group. To solve this problem, we propose Generalized Prompt Optimization framework, which incorporates the unlabeled data from the target group into prompt optimization. Extensive experimental results demonstrate the effectiveness of the proposed framework with significant performance improvement on the target group and comparable performance on the source group.

Robust Prompt Optimization for Large Language Models Against Distribution Shifts

TL;DR

This work identifies a robustness gap in gradient-free prompt optimization for LLMs under distribution shifts between source and target groups. It introduces the Generalized Prompt Optimization (GPO) framework, which leverages a meta-prompt–driven generation of multiple prompts, label ensemble strategies on unlabeled target inputs, and joint optimization to produce a single task-specific prompt that generalizes across distributions. Through extensive experiments across 16 datasets spanning six NLP tasks, GPO improves target-group performance while maintaining source-group performance relative to strong baselines like APE and APO, and shows effectiveness across multiple backbone LLMs, including GPT-4. The study highlights the importance of accurate target-labeling and consistency thresholds, and points to practical avenues for deploying robust prompts in real-world, shifting environments.

Abstract

Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks. However, their effectiveness is highly dependent on the phrasing of the task prompt, leading to research on automatic prompt optimization using labeled task data. We reveal that these prompt optimization techniques are vulnerable to distribution shifts such as subpopulation shifts, which are common for LLMs in real-world scenarios such as customer reviews analysis. In this light, we propose a new problem of robust prompt optimization for LLMs against distribution shifts, which requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group. To solve this problem, we propose Generalized Prompt Optimization framework, which incorporates the unlabeled data from the target group into prompt optimization. Extensive experimental results demonstrate the effectiveness of the proposed framework with significant performance improvement on the target group and comparable performance on the source group.
Paper Structure (32 sections, 5 equations, 3 figures, 15 tables)

This paper contains 32 sections, 5 equations, 3 figures, 15 tables.

Figures (3)

  • Figure 1: Illustration of prompt optimization under distribution shifts. Existing prompt optimization solutions aim to improve the LLM performance on the training data, while it is unclear whether the optimized prompt can be generalized to testing data of the same task but with distribution shifts.
  • Figure 2: The GPO Framework.
  • Figure 3: Target group performance under different percentage of wrong labels. The blue dotted line indicates the labeling accuracy of GPO as in Table \ref{['tab:conf_acc']}.