Table of Contents
Fetching ...

Uncertainty Unveiled: Can Exposure to More In-context Examples Mitigate Uncertainty for Large Language Models?

Yifei Wang, Yu Sheng, Linjing Li, Daniel Zeng

TL;DR

Uncertainty Unveiled analyzes how increasing in-context demonstrations affects the trustworthiness of large language models (LLMs) under long-context ICL. It introduces a Bayesian uncertainty quantification framework that partitions total uncertainty ($TU$) into epistemic ($EU$) and aleatoric ($AU$) components and shows that additional in-context examples mainly reduce $EU$, enhancing performance by injecting task-specific knowledge. The study finds that benefits persist at large model scales but can be tempered for complex reasoning tasks by rising $AU$, and it reveals internal mechanisms via residual-stream projections and logit-margin amplification that explain the uncertainty reductions. Practically, the work suggests favoring diverse, information-rich demonstrations and provides interpretability directions to understand how inner confidences evolve during long-context ICL, with implications for deploying trustworthy prompting strategies in high-stakes settings.

Abstract

Recent advances in handling long sequences have facilitated the exploration of long-context in-context learning (ICL). While much of the existing research emphasizes performance improvements driven by additional in-context examples, the influence on the trustworthiness of generated responses remains underexplored. This paper addresses this gap by investigating how increased examples influence predictive uncertainty, an essential aspect in trustworthiness. We begin by systematically quantifying the uncertainty of ICL with varying shot counts, analyzing the impact of example quantity. Through uncertainty decomposition, we introduce a novel perspective on performance enhancement, with a focus on epistemic uncertainty (EU). Our results reveal that additional examples reduce total uncertainty in both simple and complex tasks by injecting task-specific knowledge, thereby diminishing EU and enhancing performance. For complex tasks, these advantages emerge only after addressing the increased noise and uncertainty associated with longer inputs. Finally, we explore the evolution of internal confidence across layers, unveiling the mechanisms driving the reduction in uncertainty.

Uncertainty Unveiled: Can Exposure to More In-context Examples Mitigate Uncertainty for Large Language Models?

TL;DR

Uncertainty Unveiled analyzes how increasing in-context demonstrations affects the trustworthiness of large language models (LLMs) under long-context ICL. It introduces a Bayesian uncertainty quantification framework that partitions total uncertainty () into epistemic () and aleatoric () components and shows that additional in-context examples mainly reduce , enhancing performance by injecting task-specific knowledge. The study finds that benefits persist at large model scales but can be tempered for complex reasoning tasks by rising , and it reveals internal mechanisms via residual-stream projections and logit-margin amplification that explain the uncertainty reductions. Practically, the work suggests favoring diverse, information-rich demonstrations and provides interpretability directions to understand how inner confidences evolve during long-context ICL, with implications for deploying trustworthy prompting strategies in high-stakes settings.

Abstract

Recent advances in handling long sequences have facilitated the exploration of long-context in-context learning (ICL). While much of the existing research emphasizes performance improvements driven by additional in-context examples, the influence on the trustworthiness of generated responses remains underexplored. This paper addresses this gap by investigating how increased examples influence predictive uncertainty, an essential aspect in trustworthiness. We begin by systematically quantifying the uncertainty of ICL with varying shot counts, analyzing the impact of example quantity. Through uncertainty decomposition, we introduce a novel perspective on performance enhancement, with a focus on epistemic uncertainty (EU). Our results reveal that additional examples reduce total uncertainty in both simple and complex tasks by injecting task-specific knowledge, thereby diminishing EU and enhancing performance. For complex tasks, these advantages emerge only after addressing the increased noise and uncertainty associated with longer inputs. Finally, we explore the evolution of internal confidence across layers, unveiling the mechanisms driving the reduction in uncertainty.

Paper Structure

This paper contains 56 sections, 5 equations, 20 figures, 12 tables.

Figures (20)

  • Figure 1: Humans tend to gain task-specific knowledge and confidence as they are exposed to more examples. This raises a natural question: can additional examples similarly reduce uncertainty in LLMs?
  • Figure 2: The sources of AU and EU in many-shot ICL. AU comes from the prompt $\Omega$e.g. vast examples and the process of demonstration selection. EU originates from the model’s end, encompassing the generation and decoding processes.
  • Figure 3: A workflow for uncertainty quantification and decomposition under many-shot ICL settings, involves the following components: a LLM $\mathcal{M}$ supporting long context windows, demonstration set selection, generation sampling, and the UQ modules detailed in Sec. \ref{['uq1']} and \ref{['uq2']}.
  • Figure 4: The average TU under $k$-shot ICL with error bands for three runs.
  • Figure 5: The average accuracy under $k$-shot ICL with error bands for three runs.
  • ...and 15 more figures