TRACER: Trajectory Risk Aggregation for Critical Episodes in Agentic Reasoning
Sina Tayebati, Divake Kumar, Nastaran Darabi, Davide Ettori, Ranganath Krishnan, Amit Ranjan Trivedi
TL;DR
TRACER tackles uncertainty estimation in long, multi-turn tool-using dialogues by reframing failures as sparse, trajectory-level events rather than token-level mistakes. It combines Stage I content-aware surprisal, Stage II situation-awareness indicators for repetition and coherence, and Stage III MAX-based tail-risk aggregation to produce a robust trajectory risk score with theoretical guarantees. Applied to the τ^2-bench dual-control environment, TRACER improves failure prediction (AUROC) and selective execution (AUARC) and provides earlier warning signals compared with token-based baselines. The approach enables safer, more reliable abstention and intervention in real-world agentic systems that involve human-in-the-loop and tool use.
Abstract
Estimating uncertainty for AI agents in real-world multi-turn tool-using interaction with humans is difficult because failures are often triggered by sparse critical episodes (e.g., looping, incoherent tool use, or user-agent miscoordination) even when local generation appears confident. Existing uncertainty proxies focus on single-shot text generation and therefore miss these trajectory-level breakdown signals. We introduce TRACER, a trajectory-level uncertainty metric for dual-control Tool-Agent-User interaction. TRACER combines content-aware surprisal with situational-awareness signals, semantic and lexical repetition, and tool-grounded coherence gaps, and aggregates them using a tail-focused risk functional with a MAX-composite step risk to surface decisive anomalies. We evaluate TRACER on $τ^2$-bench by predicting task failure and selective task execution. To this end, TRACER improves AUROC by up to 37.1% and AUARC by up to 55% over baselines, enabling earlier and more accurate detection of uncertainty in complex conversational tool-use settings. Our code and benchmark are available at https://github.com/sinatayebati/agent-tracer.
