ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning
Jae-Woo Choi, Hyungmin Kim, Hyobin Ong, Minsu Jang, Dohyung Kim, Jaehong Kim, Youngwoo Yoon
TL;DR
ReAcTree tackles long-horizon planning under partial observability by constructing a dynamic, hierarchical agent tree where each node handles a subgoal and can reason, act, or expand, while control-flow nodes implement sequence, fallback, or parallel execution to coordinate subgoals. Two memory systems—episodic memory for subgoal-level in-context examples and working memory for environment observations—enhance in-context reasoning and cross-node awareness. Empirically, ReAcTree and its memory-augmented variant outperform strong baselines like ReAct and Tree-Planner across LoTa-Bench tasks (WAH-NL and ALFRED) and multiple LLMs, including gains with smaller models; ablations illuminate the critical roles of memory and control-flow expressivity. The results demonstrate robust, scalable long-horizon planning for embodied agents and highlight design directions for memory integration and dynamic subgoal expansion in real-world settings.
Abstract
Recent advancements in large language models (LLMs) have enabled significant progress in decision-making and task planning for embodied autonomous agents. However, most existing methods still struggle with complex, long-horizon tasks because they rely on a monolithic trajectory that entangles all past decisions and observations, attempting to solve the entire task in a single unified process. To address this limitation, we propose ReAcTree, a hierarchical task-planning method that decomposes a complex goal into more manageable subgoals within a dynamically constructed agent tree. Each subgoal is handled by an LLM agent node capable of reasoning, acting, and further expanding the tree, while control flow nodes coordinate the execution strategies of agent nodes. In addition, we integrate two complementary memory systems: each agent node retrieves goal-specific, subgoal-level examples from episodic memory and shares environment-specific observations through working memory. Experiments on the WAH-NL and ALFRED datasets demonstrate that ReAcTree consistently outperforms strong task-planning baselines such as ReAct across diverse LLMs. Notably, on WAH-NL, ReAcTree achieves a 61% goal success rate with Qwen 2.5 72B, nearly doubling ReAct's 31%.
