Table of Contents
Fetching ...

Learn to Select: Exploring Label Distribution Divergence for In-Context Demonstration Selection in Text Classification

Ye Jiang, Taihang Wang, Youzheng Liu, Yimin Wang, Yuhan Xia, Yunfei Long

TL;DR

In-context learning for text classification is highly sensitive to the choice of demonstrations. The paper introduces TopK+L2D, a two-stage approach that first selects semantically similar demonstrations via TopK and then re-ranks them by aligning label distributions using a fine-tuned small language model to estimate $P_{test}$ and $P_{pool}$, quantified with $D_{KL}$ and $D_{JS}$ divergences. The final hybrid score blends semantic relevance and label-distribution alignment, yielding improved accuracy across seven benchmarks and several model scales, with a positive correlation between SLM accuracy and LLM performance. This work advances robust demonstration selection by ensuring label-consistency in addition to semantic similarity, enhancing practical applicability of in-context learning.

Abstract

In-context learning (ICL) for text classification, which uses a few input-label demonstrations to describe a task, has demonstrated impressive performance on large language models (LLMs). However, the selection of in-context demonstrations plays a crucial role and can significantly affect LLMs' performance. Most existing demonstration selection methods primarily focus on semantic similarity between test inputs and demonstrations, often overlooking the importance of label distribution alignment. To address this limitation, we propose a two-stage demonstration selection method, TopK + Label Distribution Divergence (L2D), which leverages a fine-tuned BERT-like small language model (SLM) to generate label distributions and calculate their divergence for both test inputs and candidate demonstrations. This enables the selection of demonstrations that are not only semantically similar but also aligned in label distribution with the test input. Extensive experiments across seven text classification benchmarks show that our method consistently outperforms previous demonstration selection strategies. Further analysis reveals a positive correlation between the performance of LLMs and the accuracy of the underlying SLMs used for label distribution estimation.

Learn to Select: Exploring Label Distribution Divergence for In-Context Demonstration Selection in Text Classification

TL;DR

In-context learning for text classification is highly sensitive to the choice of demonstrations. The paper introduces TopK+L2D, a two-stage approach that first selects semantically similar demonstrations via TopK and then re-ranks them by aligning label distributions using a fine-tuned small language model to estimate and , quantified with and divergences. The final hybrid score blends semantic relevance and label-distribution alignment, yielding improved accuracy across seven benchmarks and several model scales, with a positive correlation between SLM accuracy and LLM performance. This work advances robust demonstration selection by ensuring label-consistency in addition to semantic similarity, enhancing practical applicability of in-context learning.

Abstract

In-context learning (ICL) for text classification, which uses a few input-label demonstrations to describe a task, has demonstrated impressive performance on large language models (LLMs). However, the selection of in-context demonstrations plays a crucial role and can significantly affect LLMs' performance. Most existing demonstration selection methods primarily focus on semantic similarity between test inputs and demonstrations, often overlooking the importance of label distribution alignment. To address this limitation, we propose a two-stage demonstration selection method, TopK + Label Distribution Divergence (L2D), which leverages a fine-tuned BERT-like small language model (SLM) to generate label distributions and calculate their divergence for both test inputs and candidate demonstrations. This enables the selection of demonstrations that are not only semantically similar but also aligned in label distribution with the test input. Extensive experiments across seven text classification benchmarks show that our method consistently outperforms previous demonstration selection strategies. Further analysis reveals a positive correlation between the performance of LLMs and the accuracy of the underlying SLMs used for label distribution estimation.

Paper Structure

This paper contains 11 sections, 5 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: A comparison of 2-shot in-context demonstrations retrieved by different selection methods in SST-2. Although the test input is labeled as having a positive sentiment, the overall semantics are somewhat ambiguous or controversial. Our method effectively captures the adversative conjunction in the demonstrations and aligns the label distributions with that of the test input.
  • Figure 2: The average accuracy of LLMs on seven tasks at different scales shows that our method consistently improves performance across various models. The red dotted line illustrates the averaged performance of all methods across different model scales.
  • Figure 3: (a) - (c) Performance comparison between original, arbitrary and reverse labels in in-context demonstrations across SST-2, Subj and CR. (d) Performance of our method on out-of-domain demonstration pools. 'A$\rightarrow$B' indicates that the demonstration pool is sourced from dataset 'A' while evaluation is conducted on dataset 'B'.
  • Figure 4: Performance comparison between different number of (a) in-context demonstrations and (b) candidates in semantic retrieval stage.
  • Figure 5: Performance comparison of different $\alpha$ settings.