Table of Contents
Fetching ...

In-Context Example Ordering Guided by Label Distributions

Zhichao Xu, Daniel Cohen, Bei Wang, Vivek Srikumar

TL;DR

This work tackles the sensitivity of in-context learning (ICL) to the order of in-context examples by formulating the ordering task as an optimization problem. It introduces Probability Distribution Ordering (PDO), which uses two priors inspired by Learning from Label Proportions to select performative orderings based on model probability distributions, enabling both Direct and PMI scoring and accommodating FewShot, FewShotU, and FewShotUP settings. Across 13 text classification datasets and 9 autoregressive LLMs, PDO consistently improves accuracy and reduces calibration error, while also enabling effective task-level exemplar selection without labeled development data. The approach is lightweight, generalizable across models and scoring schemes, and has practical implications for deploying calibrated ICL in real-world tasks.

Abstract

By allowing models to predict without task-specific training, in-context learning (ICL) with pretrained LLMs has enormous potential in NLP. However, a number of problems persist in ICL. In particular, its performance is sensitive to the choice and order of in-context examples. Given the same set of in-context examples with different orderings, model performance may vary between near random to near state-of-the-art. In this work, we formulate in-context example ordering as an optimization problem. We examine three problem settings that differ in the assumptions they make about what is known about the task. Inspired by the idea of learning from label proportions, we propose two principles for in-context example ordering guided by model's probability predictions. We apply our proposed principles to thirteen text classification datasets and nine different autoregressive LLMs with 700M to 13B parameters. We demonstrate that our approach outperforms the baselines by improving the classification accuracy, reducing model miscalibration, and also by selecting better in-context examples.

In-Context Example Ordering Guided by Label Distributions

TL;DR

This work tackles the sensitivity of in-context learning (ICL) to the order of in-context examples by formulating the ordering task as an optimization problem. It introduces Probability Distribution Ordering (PDO), which uses two priors inspired by Learning from Label Proportions to select performative orderings based on model probability distributions, enabling both Direct and PMI scoring and accommodating FewShot, FewShotU, and FewShotUP settings. Across 13 text classification datasets and 9 autoregressive LLMs, PDO consistently improves accuracy and reduces calibration error, while also enabling effective task-level exemplar selection without labeled development data. The approach is lightweight, generalizable across models and scoring schemes, and has practical implications for deploying calibrated ICL in real-world tasks.

Abstract

By allowing models to predict without task-specific training, in-context learning (ICL) with pretrained LLMs has enormous potential in NLP. However, a number of problems persist in ICL. In particular, its performance is sensitive to the choice and order of in-context examples. Given the same set of in-context examples with different orderings, model performance may vary between near random to near state-of-the-art. In this work, we formulate in-context example ordering as an optimization problem. We examine three problem settings that differ in the assumptions they make about what is known about the task. Inspired by the idea of learning from label proportions, we propose two principles for in-context example ordering guided by model's probability predictions. We apply our proposed principles to thirteen text classification datasets and nine different autoregressive LLMs with 700M to 13B parameters. We demonstrate that our approach outperforms the baselines by improving the classification accuracy, reducing model miscalibration, and also by selecting better in-context examples.
Paper Structure (25 sections, 9 equations, 4 figures, 11 tables)

This paper contains 25 sections, 9 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: KL-divergence vs accuracy for FewShot and FewShotUP on SST-2 dataset, with a backbone language model OPT-1.3B.
  • Figure 2: SST-2 results with different language models.
  • Figure 3: Yahoo topic results with different language models.
  • Figure 4: We show the mean accuracy over 5 topic classification datasets across different numbers of in-context training examples (from 4 to 12) under FewShotUP. The backbone LLM is LLaMA-7B. PDO's improvement is consistent with different numbers of samples.