Large Language Models Know What Makes Exemplary Contexts

Quanyu Long; Jianda Chen; Wenya Wang; Sinno Jialin Pan

Large Language Models Know What Makes Exemplary Contexts

Quanyu Long, Jianda Chen, Wenya Wang, Sinno Jialin Pan

TL;DR

This work tackles in-context learning by enabling LLMs to self-select and order demonstrations through a sequential retrieval process guided by a reward model trained on the LLM’s preferences. A parameter-efficient retrieval head, initialized from LLM embeddings, selects $k$ demonstrations in an autoregressive manner, while a reward head trained via pairwise preferences (Bradley–Terrry) provides stable feedback to a PPO-based reinforcement learning update of the retrieval head. The approach yields improved ICL performance across 11 tasks, demonstrates increased representativeness and diversity of retrieved demonstrations, and shows transferability of the learned retrieval policy across LLMs. By keeping the LLM frozen and updating only the retrieval and reward heads, the method offers a scalable, self-consistent way to optimize context for diverse tasks with potential applicability to broader retrieval-augmented AI systems.

Abstract

In-context learning (ICL) has proven to be a significant capability with the advancement of Large Language models (LLMs). By instructing LLMs using few-shot demonstrative examples, ICL enables them to perform a wide range of tasks without needing to update millions of parameters. This paper presents a unified framework for LLMs that allows them to self-select influential in-context examples to compose their contexts; self-rank candidates with different demonstration compositions; self-optimize the demonstration selection and ordering through reinforcement learning. Specifically, our method designs a parameter-efficient retrieval head that generates the optimized demonstration after training with rewards from LLM's own preference. Experimental results validate the proposed method's effectiveness in enhancing ICL performance. Additionally, our approach effectively identifies and selects the most representative examples for the current task, and includes more diversity in retrieval.

Large Language Models Know What Makes Exemplary Contexts

TL;DR

demonstrations in an autoregressive manner, while a reward head trained via pairwise preferences (Bradley–Terrry) provides stable feedback to a PPO-based reinforcement learning update of the retrieval head. The approach yields improved ICL performance across 11 tasks, demonstrates increased representativeness and diversity of retrieved demonstrations, and shows transferability of the learned retrieval policy across LLMs. By keeping the LLM frozen and updating only the retrieval and reward heads, the method offers a scalable, self-consistent way to optimize context for diverse tasks with potential applicability to broader retrieval-augmented AI systems.

Abstract

Paper Structure (23 sections, 2 equations, 3 figures, 5 tables)

This paper contains 23 sections, 2 equations, 3 figures, 5 tables.

Introduction
Method
Problem Definition
Retrieval head modeling
Reward model training
Reinforced retrieval head from self-feedback
Experiments
Datasets
Experiment Setup
Baselines
Main Results
Necessity of reward model
Representativeness and diversity of retrieved demonstrations
The number of demonstration $k$
Related Work
...and 8 more sections

Figures (3)

Figure 1: A self-select, self-rank, and self-optimize framework to retrieve influential in-context examples. The LLM can select its own demonstrations sequentially by updating the contexts of query, and optimize their compositions based on the LLM's own preference.
Figure 2: Proposed Framework Overview. (a) sequential ICL retrieval formulation, (b) the first stage for training the reward head, and (c) the second stage for training the retrieval head.
Figure 3: RTE performance using different $k$.

Large Language Models Know What Makes Exemplary Contexts

TL;DR

Abstract

Large Language Models Know What Makes Exemplary Contexts

Authors

TL;DR

Abstract

Table of Contents

Figures (3)