Table of Contents
Fetching ...

Large Language Models Prompting With Episodic Memory

Dai Do, Quan Tran, Svetha Venkatesh, Hung Le

TL;DR

POEM addresses the challenge of selecting and ordering in-context demonstrations for LLM prompting in few-shot settings. It treats prompt optimization as an RL problem with an episodic memory that archives state-action-reward histories; at test time, a nearest-neighbor memory reader chooses the test-time ordering based on similarity to training instances. POEM shows strong improvements over recent baselines such as TEMPERA and RLPrompt in few-shot text classification and general language understanding tasks, and it achieves faster convergence than competing RL-based methods. This approach offers a practical, scalable way to optimize prompt structures without additional heavy fine-tuning, accelerating reliable deployment of LLMs in diverse NLP tasks.

Abstract

Prompt optimization is essential for enhancing the performance of Large Language Models (LLMs) in a range of Natural Language Processing (NLP) tasks, particularly in scenarios of few-shot learning where training examples are incorporated directly into the prompt. Despite the growing interest in optimizing prompts with few-shot examples, existing methods for prompt optimization are often resource-intensive or perform inadequately. In this work, we propose PrOmpting with Episodic Memory (POEM), a novel prompt optimization technique that is simple, efficient, and demonstrates strong generalization capabilities. We approach prompt optimization as a Reinforcement Learning (RL) challenge, using episodic memory to archive combinations of input data, permutations of few-shot examples, and the rewards observed during training. In the testing phase, we optimize the sequence of examples for each test query by selecting the sequence that yields the highest total rewards from the top-k most similar training examples in the episodic memory. Our results show that POEM outperforms recent techniques like TEMPERA and RLPrompt by over 5.3% in various text classification tasks. Furthermore, our approach adapts well to broader language understanding tasks, consistently outperforming conventional heuristic methods for ordering examples.

Large Language Models Prompting With Episodic Memory

TL;DR

POEM addresses the challenge of selecting and ordering in-context demonstrations for LLM prompting in few-shot settings. It treats prompt optimization as an RL problem with an episodic memory that archives state-action-reward histories; at test time, a nearest-neighbor memory reader chooses the test-time ordering based on similarity to training instances. POEM shows strong improvements over recent baselines such as TEMPERA and RLPrompt in few-shot text classification and general language understanding tasks, and it achieves faster convergence than competing RL-based methods. This approach offers a practical, scalable way to optimize prompt structures without additional heavy fine-tuning, accelerating reliable deployment of LLMs in diverse NLP tasks.

Abstract

Prompt optimization is essential for enhancing the performance of Large Language Models (LLMs) in a range of Natural Language Processing (NLP) tasks, particularly in scenarios of few-shot learning where training examples are incorporated directly into the prompt. Despite the growing interest in optimizing prompts with few-shot examples, existing methods for prompt optimization are often resource-intensive or perform inadequately. In this work, we propose PrOmpting with Episodic Memory (POEM), a novel prompt optimization technique that is simple, efficient, and demonstrates strong generalization capabilities. We approach prompt optimization as a Reinforcement Learning (RL) challenge, using episodic memory to archive combinations of input data, permutations of few-shot examples, and the rewards observed during training. In the testing phase, we optimize the sequence of examples for each test query by selecting the sequence that yields the highest total rewards from the top-k most similar training examples in the episodic memory. Our results show that POEM outperforms recent techniques like TEMPERA and RLPrompt by over 5.3% in various text classification tasks. Furthermore, our approach adapts well to broader language understanding tasks, consistently outperforming conventional heuristic methods for ordering examples.
Paper Structure (18 sections, 12 equations, 2 figures, 15 tables, 1 algorithm)

This paper contains 18 sections, 12 equations, 2 figures, 15 tables, 1 algorithm.

Figures (2)

  • Figure 1: POEM Architecture. Training (left): In this phase, we select examples from the in-context dataset $D_{ic}$. The training query and the ICL example ordering are encoded into $s$ and $a$, respectively, and are used to construct a prompt for each training query. Then, we receive a reward $r$ by feeding the prompt to the downstream language model (LM), and we store the state, action, and reward in memory $\mathcal{M}$ using Memory Writing (Eq. \ref{['Algo1']}). Testing (right): During this phase, for each testing query $s_t$, we conduct Memory Reading using nearest neighbor estimation to get the action with the highest estimated value (Eq. \ref{['Algo2']}). We then build the prompt for the test query by producing the ICL examples that correspond to the best ordering action $a_t$.
  • Figure 2: Illustration of an action being encoded.