Table of Contents
Fetching ...

Unlocking the Power of LLM Uncertainty for Active In-Context Example Selection

Hsiu-Yuan Huang, Zichen Wu, Yutong Yang, Junzhao Zhang, Yunfang Wu

TL;DR

The paper introduces Unc-TTP, a novel uncertainty classification framework that uses output inconsistency across three prompt perturbations to quantify intrinsic LLM uncertainty. It then leverages this inconsistency-defined uncertainty for active in-context learning, selecting informative 1-shot demonstrations from uncertain categories to improve performance across subjective text classification tasks. Empirical results show Unc-TTP-based uncertainty sampling outperforms traditional active learning baselines (e.g., Similarity, BM25, Diversity) and exhibits favorable scalability and transferability properties between strong and weak models. These findings suggest inconsistency-based uncertainty is a practical and robust signal to guide in-context example selection for both open- and closed-source LLMs, with implications for more trustworthy and effective AI assistants.

Abstract

Large Language Models (LLMs) have shown remarkable performance across a wide range of downstream tasks. However, it is challenging for users to discern whether the responses of LLM are generated with certainty or are fabricated to meet user expectations. In this paper, we introduce Uncertainty Tripartite Testing Paradigm (Unc-TTP), a novel method for classifying LLM uncertainty by leveraging output inconsistency. Specifically, Unc-TTP performs three rounds of sampling under varying label injection interference, enumerating all possible outcomes, and uses the degree of output inconsistency as the indicator of the LLM's intrinsic uncertainty. To validate the effectiveness of this inconsistency-defined uncertainty, we draw inspiration from Active Learning, comparing the informativeness of actively selected in-context examples. Our experiments show that uncertainty examples selected via Unc-TTP are more informative than certainty examples. Furthermore, the Unc-TTP-guided uncertainty-based active example selection strategy outperforms existing methods, highlighting its effectiveness in classifying LLM uncertainty and enhancing in-context learning. This work not only underscores the potential of inconsistency-based uncertainty classification for both open- and closed-source LLMs but also presents a practical approach for leveraging uncertainty to improve LLM performance in real-world tasks.

Unlocking the Power of LLM Uncertainty for Active In-Context Example Selection

TL;DR

The paper introduces Unc-TTP, a novel uncertainty classification framework that uses output inconsistency across three prompt perturbations to quantify intrinsic LLM uncertainty. It then leverages this inconsistency-defined uncertainty for active in-context learning, selecting informative 1-shot demonstrations from uncertain categories to improve performance across subjective text classification tasks. Empirical results show Unc-TTP-based uncertainty sampling outperforms traditional active learning baselines (e.g., Similarity, BM25, Diversity) and exhibits favorable scalability and transferability properties between strong and weak models. These findings suggest inconsistency-based uncertainty is a practical and robust signal to guide in-context example selection for both open- and closed-source LLMs, with implications for more trustworthy and effective AI assistants.

Abstract

Large Language Models (LLMs) have shown remarkable performance across a wide range of downstream tasks. However, it is challenging for users to discern whether the responses of LLM are generated with certainty or are fabricated to meet user expectations. In this paper, we introduce Uncertainty Tripartite Testing Paradigm (Unc-TTP), a novel method for classifying LLM uncertainty by leveraging output inconsistency. Specifically, Unc-TTP performs three rounds of sampling under varying label injection interference, enumerating all possible outcomes, and uses the degree of output inconsistency as the indicator of the LLM's intrinsic uncertainty. To validate the effectiveness of this inconsistency-defined uncertainty, we draw inspiration from Active Learning, comparing the informativeness of actively selected in-context examples. Our experiments show that uncertainty examples selected via Unc-TTP are more informative than certainty examples. Furthermore, the Unc-TTP-guided uncertainty-based active example selection strategy outperforms existing methods, highlighting its effectiveness in classifying LLM uncertainty and enhancing in-context learning. This work not only underscores the potential of inconsistency-based uncertainty classification for both open- and closed-source LLMs but also presents a practical approach for leveraging uncertainty to improve LLM performance in real-world tasks.
Paper Structure (33 sections, 14 figures, 10 tables)

This paper contains 33 sections, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Accuracy of four models with and without label injection. See Appendix \ref{['sec:detailbehaviour']} for a detailed analysis.
  • Figure 2: Distribution of instances where the LLM remains unwavering (certain) and wavering (uncertain) across three rounds of sampling, with {no (w/o), right, wrong} label injection (top) and without label injection, using only vanilla temperature sampling (bottom), across three datasets.
  • Figure 3: Illustration of the proposed Uncertainty Tripartite Testing Paradigm (Unc-TTP) on the Sarcasm Headlines (SH) dataset. For each instance, we employ Unc-TTP to evaluate the LLM's certainty level and label it as either certain or uncertain based on its combination of testing scenarios. Each category consists of three components, corresponding to the three individual testing scenarios in the order of {no-label, right-label, wrong-label}. If the LLM answers incorrectly under a given setting, it is labeled as 0; otherwise, it is labeled as 1. We interpret instances where the LLM wavers between answers as indicative of uncertain. Conversely, instances where the LLM is unwaveringly right or wrong are considered certain.
  • Figure 4: The relative improvement in accuracy of inconsistency-defined uncertainty examples compared to certainty examples in ICL for each model. The uncertainty examples from sampling-based methods (the bottom 2 rows) demonstrate greater robustness than those from verification-based methods (the top 2 rows), where positive accuracy gains (highlighted in red) are observed more consistently than negative ones (highlighted in blue).
  • Figure 5: Accuracy scores of Mistral $K$-way $N$-shot experiments on the test set. Our Unc-TTP robustly surpasses the random baseline and the sampling-based method when the shot number is small.
  • ...and 9 more figures