From Dialogue to Execution: Mixture-of-Agents Assisted Interactive Planning for Behavior Tree-Based Long-Horizon Robot Execution

Kanata Suzuki; Kazuki Hori; Haruka Miyoshi; Shuhei Kurita; Tetsuya Ogata

From Dialogue to Execution: Mixture-of-Agents Assisted Interactive Planning for Behavior Tree-Based Long-Horizon Robot Execution

Kanata Suzuki, Kazuki Hori, Haruka Miyoshi, Shuhei Kurita, Tetsuya Ogata

TL;DR

Experiments on cocktail-making tasks show that the proposed MoA-assisted interactive planning improves dialogue efficiency while preserving execution quality in real-world robotic tasks, and indicates that MoA-assisted interactive planning improves dialogue efficiency while preserving execution quality in real-world robotic tasks.

Abstract

Interactive task planning with large language models (LLMs) enables robots to generate high-level action plans from natural language instructions. However, in long-horizon tasks, such approaches often require many questions, increasing user burden. Moreover, flat plan representations become difficult to manage as task complexity grows. We propose a framework that integrates Mixture-of-Agents (MoA)-based proxy answering into interactive planning and generates Behavior Trees (BTs) for structured long-term execution. The MoA consists of multiple LLM-based expert agents that answer general or domain-specific questions when possible, reducing unnecessary human intervention. The resulting BT hierarchically represents task logic and enables retry mechanisms and dynamic switching among multiple robot policies. Experiments on cocktail-making tasks show that the proposed method reduces human response requirements by approximately 27% while maintaining structural and semantic similarity to fully human-answered BTs. Real-robot experiments on a smoothie-making task further demonstrate successful long-horizon execution with adaptive policy switching and recovery from action failures. These results indicate that MoA-assisted interactive planning improves dialogue efficiency while preserving execution quality in real-world robotic tasks.

From Dialogue to Execution: Mixture-of-Agents Assisted Interactive Planning for Behavior Tree-Based Long-Horizon Robot Execution

TL;DR

Abstract

Paper Structure (14 sections, 1 equation, 6 figures, 5 tables)

This paper contains 14 sections, 1 equation, 6 figures, 5 tables.

INTRODUCTION
RELATED WORK
PROPOSED METHOD
BT Generation via MoA-Based Interactive Task Planning
Proxy Response Mechanism via MoA
Example Design of Proxy-Response Agents
Task Execution on a Real Robot Using the Generated BT
EXPERIMENTS
Experiment 1
Experiment 2
RESULTS AND DISCUSSION
Evaluation of Generated BTs
Long-Horizon Task Execution on a Real Robot
CONCLUSION

Figures (6)

Figure 1: Overview of the proposed MoA-assisted interactive planning framework. A natural language task instruction is refined through dialogue between an LLM planner and MoA-based proxy responders. The refined specification is converted into a Behavior Tree, which is executed on a real robot for long-horizon task completion.
Figure 2: Detailed pipeline of the proposed framework. Given a task instruction, the LLM performs uncertainty analysis in the generated BT and generates clarification questions. The MoA-based proxy-response mechanism resolves answerable questions using domain-specific, robot-specific, or commonsense expertise. Once ambiguity is resolved, BT's action nodes are assigned appropriate learning models for execution.
Figure 3: Prompt design for the Mixture-of-Agents framework. Three expert agents--Robot Expert, Task Domain Expert, and Commonsense Expert--follow a structured response format consisting of answerability analysis, partial answering, and delegation of unresolved questions. This design enables collaborative proxy answering while preserving uncertainty for human resolution.
Figure 4: Experimental setup for the smoothie-making task. The dual-arm robot performs fruit insertion, lid manipulation, and switch operation.
Figure 5: Behavior Tree generated through MoA-assisted interactive planning. The figure shows the hierarchical BT structure for the smoothie-making task along with example question–answer interactions. Robot-related clarification questions are answered by the Robot Expert agent, while user-preference-related questions are resolved by the human. Different action nodes are assigned either Diffusion Policy or $\pi_{0.5}$ models.
...and 1 more figures

From Dialogue to Execution: Mixture-of-Agents Assisted Interactive Planning for Behavior Tree-Based Long-Horizon Robot Execution

TL;DR

Abstract

From Dialogue to Execution: Mixture-of-Agents Assisted Interactive Planning for Behavior Tree-Based Long-Horizon Robot Execution

Authors

TL;DR

Abstract

Table of Contents

Figures (6)