Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation
Shiyuan Yin, Chenjia Bai, Zihao Zhang, Junwei Jin, Xinxin Zhang, Chi Zhang, Xuelong Li
TL;DR
This work tackles the challenge of unreliable planning by LLMs in robotics due to hallucinations and instruction ambiguity. It introduces CURE, a plug-and-play framework that decomposes planning uncertainty into epistemic (task clarity and task familiarity) and intrinsic (expected success rate) components, estimated via RND and MLP heads driven by LLM features. The approach is validated on kitchen manipulation and tabletop rearrangement tasks, showing stronger correlations between estimated uncertainty and actual execution outcomes than baselines, and yielding substantial improvements in the SR-HR-AUC metric. The results demonstrate the practical value of granular uncertainty modeling for safer, more reliable embodied planning with minimal integration overhead. Future work will address generalization to broader task sets and integrate physical reasoning to further enhance robustness.
Abstract
Large language models (LLMs) demonstrate advanced reasoning abilities, enabling robots to understand natural language instructions and generate high-level plans with appropriate grounding. However, LLM hallucinations present a significant challenge, often leading to overconfident yet potentially misaligned or unsafe plans. While researchers have explored uncertainty estimation to improve the reliability of LLM-based planning, existing studies have not sufficiently differentiated between epistemic and intrinsic uncertainty, limiting the effectiveness of uncertainty estimation. In this paper, we present Combined Uncertainty estimation for Reliable Embodied planning (CURE), which decomposes the uncertainty into epistemic and intrinsic uncertainty, each estimated separately. Furthermore, epistemic uncertainty is subdivided into task clarity and task familiarity for more accurate evaluation. The overall uncertainty assessments are obtained using random network distillation and multi-layer perceptron regression heads driven by LLM features. We validated our approach in two distinct experimental settings: kitchen manipulation and tabletop rearrangement experiments. The results show that, compared to existing methods, our approach yields uncertainty estimates that are more closely aligned with the actual execution outcomes.
