Table of Contents
Fetching ...

Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents

Changdae Oh, Seongheon Park, To Eun Kim, Jiatong Li, Wendi Li, Samuel Yeh, Xuefeng Du, Hamed Hassani, Paul Bogdan, Dawn Song, Sharon Li

TL;DR

The paper argues that uncertainty quantification for LLMs must move beyond static, single-turn QA to agentic, interactive, long-horizon settings where actions, observations, and environment states unfold over time. It introduces a general mathematical formulation that models the agent's trajectory as a stochastic process and shows how existing UQ approaches are special cases, revealing a key limitation: traditional UQ treats uncertainty as monotonically accumulating. To address this, it proposes a conditional uncertainty reduction process with an information-gating mechanism that differentiates interactive, evidential actions from non-interactive ones, enabling reducible uncertainty and providing analytic bounds. The work discusses practical implications across frontier LLMs, healthcare, software engineering, and robotics, and outlines open problems—benchmarks, long-horizon estimation, and multi-agent dynamics—laying a foundation for safer, uncertainty-aware agentic systems. Overall, the framework provides principled guidance for designing uncertainty-aware LLM agents capable of interacting with users and tools while actively managing risk.

Abstract

Uncertainty quantification (UQ) for large language models (LLMs) is a key building block for safety guardrails of daily LLM applications. Yet, even as LLM agents are increasingly deployed in highly complex tasks, most UQ research still centers on single-turn question-answering. We argue that UQ research must shift to realistic settings with interactive agents, and that a new principled framework for agent UQ is needed. This paper presents the first general formulation of agent UQ that subsumes broad classes of existing UQ setups. Under this formulation, we show that prior works implicitly treat LLM UQ as an uncertainty accumulation process, a viewpoint that breaks down for interactive agents in an open world. In contrast, we propose a novel perspective, a conditional uncertainty reduction process, that explicitly models reducible uncertainty over an agent's trajectory by highlighting "interactivity" of actions. From this perspective, we outline a conceptual framework to provide actionable guidance for designing UQ in LLM agent setups. Finally, we conclude with practical implications of the agent UQ in frontier LLM development and domain-specific applications, as well as open remaining problems.

Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents

TL;DR

The paper argues that uncertainty quantification for LLMs must move beyond static, single-turn QA to agentic, interactive, long-horizon settings where actions, observations, and environment states unfold over time. It introduces a general mathematical formulation that models the agent's trajectory as a stochastic process and shows how existing UQ approaches are special cases, revealing a key limitation: traditional UQ treats uncertainty as monotonically accumulating. To address this, it proposes a conditional uncertainty reduction process with an information-gating mechanism that differentiates interactive, evidential actions from non-interactive ones, enabling reducible uncertainty and providing analytic bounds. The work discusses practical implications across frontier LLMs, healthcare, software engineering, and robotics, and outlines open problems—benchmarks, long-horizon estimation, and multi-agent dynamics—laying a foundation for safer, uncertainty-aware agentic systems. Overall, the framework provides principled guidance for designing uncertainty-aware LLM agents capable of interacting with users and tools while actively managing risk.

Abstract

Uncertainty quantification (UQ) for large language models (LLMs) is a key building block for safety guardrails of daily LLM applications. Yet, even as LLM agents are increasingly deployed in highly complex tasks, most UQ research still centers on single-turn question-answering. We argue that UQ research must shift to realistic settings with interactive agents, and that a new principled framework for agent UQ is needed. This paper presents the first general formulation of agent UQ that subsumes broad classes of existing UQ setups. Under this formulation, we show that prior works implicitly treat LLM UQ as an uncertainty accumulation process, a viewpoint that breaks down for interactive agents in an open world. In contrast, we propose a novel perspective, a conditional uncertainty reduction process, that explicitly models reducible uncertainty over an agent's trajectory by highlighting "interactivity" of actions. From this perspective, we outline a conceptual framework to provide actionable guidance for designing UQ in LLM agent setups. Finally, we conclude with practical implications of the agent UQ in frontier LLM development and domain-specific applications, as well as open remaining problems.
Paper Structure (46 sections, 2 theorems, 9 equations, 4 figures, 2 tables)

This paper contains 46 sections, 2 theorems, 9 equations, 4 figures, 2 tables.

Key Result

Lemma 1

Let the lower bound of agent total uncertainty in Eq. eq:infogate be $\tilde{U}(\mathcal{F}_{\leq T})$, denote $U(X):=H(X)=\mathbb{E}[-\log P(X)]$ and $\text{Info}(X;Y):=I(X;Y)=\mathbb{E}[\log\frac{P(X,Y)}{P(X)P(Y)}]$, then, we have:

Figures (4)

  • Figure 1: Comparison between UQ setups. Traditional LLM UQ (a) measures the uncertainty of an answer given a question, whereas the UQ for LLM reasoning (b) expands the problem by considering multi-step responses rather than a single response. Agent UQ (c) goes further by considering continual interactions between agent and user/environment across the trajectory, making it a multi-turn, interactive inference setup (example sourced from $\tau$-bench Airline; barres2025tau).
  • Figure 2: Graphical model for an agent problem-solving trajectory with examples. Given a task specification $E_0$ and an initial user query $O_0$, an agent spans a multi-turn trajectory characterized by a chain of action $A$, observation $O$, and environment state $E$. This simple abstraction describes some representative agentic prompting methods such as ReAct yao2022react. See Appendix \ref{['sec:apdx:formulation:prompting']} for details.
  • Figure 3: Limitation of existing UQ (a) and our suggestion (b). Prior works just concern the evidentiality when designing or evaluating UQ while neglecting interactivity, so they may fail to reliably capture the agent's failure. We urge expanding a dimension of interest, interactivity, and moving on to reducible uncertainty modeling of agents.
  • Figure 4: Illustration on the proposed agent UQ paradigm. We propose the conditional uncertainty reduction process for LLM agents by discerning interactive and evidential actions from others.

Theorems & Definitions (6)

  • Definition 1: Stochastic Agent System
  • Definition 2: Agent UQ
  • Lemma 1: Extrema of Information Gating
  • proof
  • Lemma 2: Restatement of Lemma \ref{['lem:extreme']}
  • proof