Structured Uncertainty guided Clarification for LLM Agents
Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Dinesh Manocha
TL;DR
This work tackles ambiguity in tool-calling for LLM agents by grounding disambiguation in structured tool schemas and modeling joint tool-argument clarification as a POMDP. It introduces SAGE-Agent, which uses a Bayesian EVPI-based, cost-aware approach to select clarifying questions and update domain constraints, achieving higher task success with fewer questions on ClarifyBench. The paper also presents ClarifyBench, a multi-domain benchmark with realistic user simulation, and demonstrates that structured uncertainty provides effective training signals, significantly boosting When2Call performance via uncertainty-weighted reinforcement learning. Overall, structured uncertainty offers a principled, efficient framework for reliable tool-augmented agents with real-world impact across domains such as document processing, vehicle control, and travel planning.
Abstract
LLM agents extend large language models with tool-calling capabilities, but ambiguous user instructions often lead to incorrect invocations and task failures. We introduce a principled formulation of structured uncertainty over tool-call parameters, modeling joint tool-argument clarification as a POMDP with Expected Value of Perfect Information (EVPI) objective for optimal question selection and aspect-based cost modeling to prevent redundancy. Our SAGE-Agent leverages this structured uncertainty to achieve superior efficiency: increasing coverage on ambiguous tasks by 7-39\% while reducing clarification questions by 1.5-2.7$\times$ compared to strong prompting and uncertainty-based baselines. We present ClarifyBench, the first multi-turn tool-augmented disambiguation benchmark with realistic LLM-based user simulation across diverse domains including document editing, vehicle control, and travel booking. Additionally, we demonstrate that structured uncertainty provides effective training signals for reinforcement learning, boosting When2Call accuracy from 36.5\% to 65.2\% (3B model) and 36.7\% to 62.9\% (7B model) through uncertainty-weighted GRPO training. These results establish structured uncertainty as a principled, efficient approach for tool-augmented agents, improving both task success and interaction efficiency in real-world scenarios.
