Table of Contents
Fetching ...

Prompt Optimization via Adversarial In-Context Learning

Xuan Long Do, Yiran Zhao, Hannah Brown, Yuxi Xie, James Xu Zhao, Nancy F. Chen, Kenji Kawaguchi, Michael Shieh, Junxian He

TL;DR

Adversarial In-Context Learning (adv-ICL) tackles prompt optimization for in-context learning by introducing a three-LLM, adversarial setup that keeps model weights fixed while prompts are iteratively improved through a Generator, a Discriminator, and a Prompt Modifier. The framework defines a GAN-like objective and a minimax training loop, with a zero-shot prompt modification strategy to generate prompt variants. Empirical results across 13 NLP tasks show consistent improvements over state-of-the-art prompt-optimization baselines for both open- and closed-source models, including notable gains on generation, classification, and reasoning benchmarks such as MMLU and BBH, even with limited data. The method is computationally efficient, data-light, and broadly applicable, though it relies on capable LLMs and careful discriminator-generator pairing; future work includes exploring model combinations and extending the approach to more tasks.

Abstract

We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough output to fool the discriminator. In each round, given an input prefixed by task instructions and several exemplars, the generator produces an output. The discriminator is then tasked with classifying the generator input-output pair as model-generated or real data. Based on the discriminator loss, the prompt modifier proposes possible edits to the generator and discriminator prompts, and the edits that most improve the adversarial loss are selected. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques for both open and closed-source models on 11 generation and classification tasks including summarization, arithmetic reasoning, machine translation, data-to-text generation, and the MMLU and big-bench hard benchmarks. In addition, because our method uses pre-trained models and updates only prompts rather than model parameters, it is computationally efficient, easy to extend to any LLM and task, and effective in low-resource settings.

Prompt Optimization via Adversarial In-Context Learning

TL;DR

Adversarial In-Context Learning (adv-ICL) tackles prompt optimization for in-context learning by introducing a three-LLM, adversarial setup that keeps model weights fixed while prompts are iteratively improved through a Generator, a Discriminator, and a Prompt Modifier. The framework defines a GAN-like objective and a minimax training loop, with a zero-shot prompt modification strategy to generate prompt variants. Empirical results across 13 NLP tasks show consistent improvements over state-of-the-art prompt-optimization baselines for both open- and closed-source models, including notable gains on generation, classification, and reasoning benchmarks such as MMLU and BBH, even with limited data. The method is computationally efficient, data-light, and broadly applicable, though it relies on capable LLMs and careful discriminator-generator pairing; future work includes exploring model combinations and extending the approach to more tasks.

Abstract

We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough output to fool the discriminator. In each round, given an input prefixed by task instructions and several exemplars, the generator produces an output. The discriminator is then tasked with classifying the generator input-output pair as model-generated or real data. Based on the discriminator loss, the prompt modifier proposes possible edits to the generator and discriminator prompts, and the edits that most improve the adversarial loss are selected. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques for both open and closed-source models on 11 generation and classification tasks including summarization, arithmetic reasoning, machine translation, data-to-text generation, and the MMLU and big-bench hard benchmarks. In addition, because our method uses pre-trained models and updates only prompts rather than model parameters, it is computationally efficient, easy to extend to any LLM and task, and effective in low-resource settings.
Paper Structure (53 sections, 3 theorems, 5 equations, 11 figures, 11 tables)

This paper contains 53 sections, 3 theorems, 5 equations, 11 figures, 11 tables.

Key Result

Proposition 1

(Motivated by goodfellow2014generative) If $G$ and $D$ have enough capacity, and at each training step, the discriminator is allowed to reach its optimum $D^*$ given $G$, and $p_g$ is updated so as to improve the criterion then $p_g$ converges to $p_{data}$.

Figures (11)

  • Figure 1: adv-ICL orchestrates a minimax game between a Generator and a Discriminator, both powered by LLMs with few-shot prompts. The Generator crafts responses to unlabeled examples, while the Discriminator distinguishes between generated and ground truth outputs. Updates are made by a Prompt Modifier which modifies prompts based on the adversarial loss.
  • Figure 2: An example of a task prompt for the discriminator $D_V$ with prompt components labeled.
  • Figure 3: Example of how the prompt modifier generates new versions of $G_U$'s prompt $U$ including new task instructions and new data examples. Full prompts used for $M$ are in \ref{['appendix:prompts-for-prompt-modifier']}.
  • Figure 4: Results on selected tasks from BBH with ChatGPT using 5-shot Chain-of-Thought prompting. Full results can be found in \ref{['appendix:bbh_full']}
  • Figure 5: Ablation study on ChatGPT with adv-ICL in which we only update the task instruction or demonstrations.
  • ...and 6 more figures

Theorems & Definitions (5)

  • Proposition 1
  • Proposition 2
  • proof : Proof for Proposition \ref{['prop:prop1']}
  • Proposition 3
  • proof : Proof for Proposition \ref{['prop:prop4']}