Table of Contents
Fetching ...

Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents

Michael Kirchhof, Gjergji Kasneci, Enkelejda Kasneci

TL;DR

This paper argues that the traditional aleatoric/epistemic uncertainty dichotomy is ill-suited for open, interactive LLM-agent settings and highlights conflicts even in simple theoretical cases. It proposes three research directions—underspecification uncertainties, interactive learning, and output uncertainties—to better capture and communicate uncertainty during multi-turn human-computer interactions. By reviewing conflicting foundations and empirical results, it advocates moving beyond scalar uncertainty measures toward richer, interaction-aware representations and communicative formats. The work aims to enhance transparency, trust, and accessibility of LLM agents operating in dynamic, information-sparse environments with users.

Abstract

Large-language models (LLMs) and chatbot agents are known to provide wrong outputs at times, and it was recently found that this can never be fully prevented. Hence, uncertainty quantification plays a crucial role, aiming to quantify the level of ambiguity in either one overall number or two numbers for aleatoric and epistemic uncertainty. This position paper argues that this traditional dichotomy of uncertainties is too limited for the open and interactive setup that LLM agents operate in when communicating with a user, and that we need to research avenues that enrich uncertainties in this novel scenario. We review the literature and find that popular definitions of aleatoric and epistemic uncertainties directly contradict each other and lose their meaning in interactive LLM agent settings. Hence, we propose three novel research directions that focus on uncertainties in such human-computer interactions: Underspecification uncertainties, for when users do not provide all information or define the exact task at the first go, interactive learning, to ask follow-up questions and reduce the uncertainty about the current context, and output uncertainties, to utilize the rich language and speech space to express uncertainties as more than mere numbers. We expect that these new ways of dealing with and communicating uncertainties will lead to LLM agent interactions that are more transparent, trustworthy, and intuitive.

Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents

TL;DR

This paper argues that the traditional aleatoric/epistemic uncertainty dichotomy is ill-suited for open, interactive LLM-agent settings and highlights conflicts even in simple theoretical cases. It proposes three research directions—underspecification uncertainties, interactive learning, and output uncertainties—to better capture and communicate uncertainty during multi-turn human-computer interactions. By reviewing conflicting foundations and empirical results, it advocates moving beyond scalar uncertainty measures toward richer, interaction-aware representations and communicative formats. The work aims to enhance transparency, trust, and accessibility of LLM agents operating in dynamic, information-sparse environments with users.

Abstract

Large-language models (LLMs) and chatbot agents are known to provide wrong outputs at times, and it was recently found that this can never be fully prevented. Hence, uncertainty quantification plays a crucial role, aiming to quantify the level of ambiguity in either one overall number or two numbers for aleatoric and epistemic uncertainty. This position paper argues that this traditional dichotomy of uncertainties is too limited for the open and interactive setup that LLM agents operate in when communicating with a user, and that we need to research avenues that enrich uncertainties in this novel scenario. We review the literature and find that popular definitions of aleatoric and epistemic uncertainties directly contradict each other and lose their meaning in interactive LLM agent settings. Hence, we propose three novel research directions that focus on uncertainties in such human-computer interactions: Underspecification uncertainties, for when users do not provide all information or define the exact task at the first go, interactive learning, to ask follow-up questions and reduce the uncertainty about the current context, and output uncertainties, to utilize the rich language and speech space to express uncertainties as more than mere numbers. We expect that these new ways of dealing with and communicating uncertainties will lead to LLM agent interactions that are more transparent, trustworthy, and intuitive.

Paper Structure

This paper contains 15 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The traditional view on uncertainties suggests a clear black-and-white dichotomy between aleatoric and epistemic uncertainty. We argue that recent developments show this dichotomy is not that simple, and not helpful for developing LLM agents.
  • Figure 2: In a binary prediction, the learner may have a belief that the Bernoulli probability is either high or low. Some schools of thought see this as a case of maximum epistemic uncertainty whereas other see it as nearly minimal epistemic uncertainty.
  • Figure 3: Using a too simple model class, like a linear model to fit quadratic data, leads to wide uncertainty estimates. The question is whether this is irreducible, and thus aleatoric uncertainty. Bayes-optimality schools of thoughts would argue that yes, it is irreducible within the model class and thus aleatoric, whereas data-uncertainty schools of thought would argue that it is reducible when choosing a better-suited model class, hence it is not aleatoric.
  • Figure 4: When estimating aleatoric and epistemic uncertainties, they can often not be disentangled. This plot is reproduced with permission from mucsanyi2024benchmarking, where \ref{['eq:information_theoretical']} was used to split aleatoric and epistemic uncertainty of a deep ensemble trained on ImageNet-1k. The estimates end up being nearly perfectly correlated, thus capturing the same uncertainty in practice.
  • Figure 5: ArXiv preprints in computer science, statistics, and math that include the terms "aleatoric" or "epistemic" in their title or abstract. The usage is at an all-time high, with roughly one paper being published each day in 2024.