From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents

SeungWon Seo; SooBin Lim; SeongRae Noh; Haneul Kim; HyeongYeop Kang

From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents

SeungWon Seo, SooBin Lim, SeongRae Noh, Haneul Kim, HyeongYeop Kang

TL;DR

This work addresses planning under uncertainty for decentralized, partially observable embodied agents by turning latent assumptions embedded in LLM reasoning into a structured decision tree. The Planner-Composer-Evaluator (PCE) framework extracts, aggregates, and evaluates these assumptions across scenario trees using metrics for scenario likelihood, conditional gain, and execution cost to guide action without heavy inter-agent dialogue. Across two benchmarks, C-WAH and TDW-MAT, and multiple backbones, PCE yields substantial gains in success rate and task efficiency while keeping token usage comparable, with ablations showing gains persist with scaling. A user study confirms that PCE produces communication patterns that human partners perceive as more efficient and trustworthy, highlighting the practical value of principled, uncertainty-aware planning in collaborative robotics and embodied AI.

Abstract

Embodied agents operating in multi-agent, partially observable, and decentralized environments must plan and act despite pervasive uncertainty about hidden objects and collaborators' intentions. Recent advances in applying Large Language Models (LLMs) to embodied agents have addressed many long-standing challenges, such as high-level goal decomposition and online adaptation. Yet, uncertainty is still primarily mitigated through frequent inter-agent communication. This incurs substantial token and time costs, and can disrupt established workflows, when human partners are involved. We introduce PCE, a Planner-Composer-Evaluator framework that converts the fragmented assumptions latent in LLM reasoning traces into a structured decision tree. Internal nodes encode environment assumptions and leaves map to actions; each path is then scored by scenario likelihood, goal-directed gain, and execution cost to guide rational action selection without heavy communication. Across two challenging multi-agent benchmarks (C-WAH and TDW-MAT) and three diverse LLM backbones, PCE consistently outperforms communication-centric baselines in success rate and task efficiency while showing comparable token usage. Ablation results indicate that the performance gains obtained by scaling model capacity or reasoning depth persist even when PCE is applied, while PCE consistently raises the baseline across both capacity and reasoning-depth scales, confirming that structured uncertainty handling complements both forms of scaling. A user study further demonstrates that PCE produces communication patterns that human partners perceive as more efficient and trustworthy. Together, these results establish a principled route for turning latent LLM assumptions into reliable strategies for uncertainty-aware planning.

From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents

TL;DR

Abstract

Paper Structure (68 sections, 3 equations, 13 figures, 17 tables)

This paper contains 68 sections, 3 equations, 13 figures, 17 tables.

Introduction
Related Work
Embodied Multi-Agent Cooperation.
Scaling LLM for Embodied Agents.
Tree-Structured Reasoning and Search.
Problem Definition
Method
Observation and Memory Modules
Planner: Reasoning for Action Selection
Composer: From Reasoning trace to Scenario Tree
Evaluator: Scenario likelihood, conditional gain, and execution cost
Scenario likelihood ($\mathcal{L}$).
Conditional gain ($\mathcal{G}$).
Execution cost ($C$).
Final score ($U$).
...and 53 more sections

Figures (13)

Figure 1: PCE employs a modular architecture with a Planner, Composer, and Evaluator pipeline for planning.
Figure 2: Flow from reasoning trace to action selection. (a) The Planner produces a reasoning trace. (b) The Composer extracts hypotheses from the trace, structures them into a decision tree, and, when needed, generates new assumptions and communication actions to expand unexplored branches. (c) The Evaluator scores each path; The highlighted path indicates the scenario whose leaf node achieves the maximum score ($U$), determining the agent’s final selected action.
Figure 3: Ablation results about LLM Scaling in C-WAH environment.
Figure 4: User study results in C-WAH environment.
Figure 5: example environment
...and 8 more figures

From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents

TL;DR

Abstract

From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (13)