Table of Contents
Fetching ...

PLR: Plackett-Luce for Reordering In-Context Learning Examples

Pawel Batorski, Paul Swoboda

Abstract

In-context learning (ICL) adapts large language models by conditioning on a small set of ICL examples, avoiding costly parameter updates. Among other factors, performance is often highly sensitive to the ordering of the examples. However, exhaustive search over the $n!$ possible orderings is infeasible. Therefore more efficient ordering methods use model confidence measures (e.g., label-probability entropy) over label sets or take a direct approach to finding the best ordering. We propose PLR, a probabilistic approach to in-context example ordering that replaces discrete ordering search with learning a probability distribution over orderings with the Plackett-Luce model. PLR models orderings using a Plackett-Luce distribution and iteratively updates its parameters to concentrate probability mass on high-performing orderings under a task-level metric. Candidate orderings are sampled efficiently via a Gumbel perturb-and-sort procedure. Experiments on multiple classification benchmarks show that PLR consistently improves few-shot accuracy for $k \in \{4, 8, 16, 32\}$ examples, and we further demonstrate gains on mathematical reasoning tasks where label-based ordering methods are not applicable. Our code is available at https://github.com/Batorskq/PLR.

PLR: Plackett-Luce for Reordering In-Context Learning Examples

Abstract

In-context learning (ICL) adapts large language models by conditioning on a small set of ICL examples, avoiding costly parameter updates. Among other factors, performance is often highly sensitive to the ordering of the examples. However, exhaustive search over the possible orderings is infeasible. Therefore more efficient ordering methods use model confidence measures (e.g., label-probability entropy) over label sets or take a direct approach to finding the best ordering. We propose PLR, a probabilistic approach to in-context example ordering that replaces discrete ordering search with learning a probability distribution over orderings with the Plackett-Luce model. PLR models orderings using a Plackett-Luce distribution and iteratively updates its parameters to concentrate probability mass on high-performing orderings under a task-level metric. Candidate orderings are sampled efficiently via a Gumbel perturb-and-sort procedure. Experiments on multiple classification benchmarks show that PLR consistently improves few-shot accuracy for examples, and we further demonstrate gains on mathematical reasoning tasks where label-based ordering methods are not applicable. Our code is available at https://github.com/Batorskq/PLR.
Paper Structure (30 sections, 1 theorem, 19 equations, 3 figures, 6 tables, 1 algorithm)

This paper contains 30 sections, 1 theorem, 19 equations, 3 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Let $\Delta(S_n)$ denote the set of all probability distributions over permutations $S_n$ equipped with the $\ell_1$ norm $\|p-q\|_1=\sum_{\pi\in S_n} |p(\pi)-q(\pi)|$. (i) Density of mixtures. For any target distribution $p\in \Delta(S_n)$ and any $\varepsilon>0$, there exist an integer $K\le n!$, satisfies $\|q-p\|_1 < \varepsilon$. In particular, the family of mixture-PL models is dense in $\D

Figures (3)

  • Figure 1: Overview of our probabilistic approach to in-context example ordering. We maintain a Plackett-Luce distribution over permutations, repeatedly sample ICL example orderings, evaluate them under a task-level metric, and update the distribution to shift probability mass toward high-performing orders and away from low-performing ones.
  • Figure 2: Illustration of PLR. Given a Plackett--Luce distribution, we sample high-probability permutations with the Gumbel trick. Each permutation is scored and the top-K are retained and used for fitting an improved Plackett-Luce distribution. This is iterated until high quality ICL ordering is found.
  • Figure 3: Top: Ablation probability-test accuracy results for PLR-EMA and Bottom: for PLR-1.

Theorems & Definitions (1)

  • Theorem 1: Expressivity: Mixture-PL is dense, single PL is not (for $n\ge 3$)