Table of Contents
Fetching ...

Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning

Shanwei Fan, Bin Zhang, Zhiwei Xu, Yingxuan Teng, Siqi Dai, Lin Cheng, Guoliang Fan

TL;DR

This work tackles the misalignment between abstract LLM plans and actionable environment behaviors in open-world RL. It introduces SGA-ACR, which offline-constructs an environment-specific subgoal graph and an entity knowledge base, then online employs a multi-LLM actor-critic-refiner planning pipeline augmented by graph-based retrieval, with a subgoal tracker providing execution feedback. Empirical results on 22 Crafter tasks show that SGA-ACR yields superior planning quality, better plan-execution alignment, and robustness across LLM scales compared to strong baselines. The contributions include a structured knowledge framework, a non-finetuning multi-LLM planning pipeline, and a bidirectional subgoal tracker, enabling efficient and reliable open-world RL with existing LLMs.

Abstract

Large language models (LLMs) offer strong high-level planning capabilities for reinforcement learning (RL) by decomposing tasks into subgoals. However, their practical utility is limited by poor planning-execution alignment, which reflects a critical gap between abstract plans and actionable, environment-compatible behaviors. This misalignment arises from two interrelated limitations: (1) LLMs often produce subgoals that are semantically plausible but infeasible or irrelevant in the target environment due to insufficient grounding in environment-specific knowledge, and (2) single-LLM planning conflates generation with self-verification, resulting in overconfident yet unreliable subgoals that frequently fail during execution. To address these challenges, we propose Subgoal Graph-Augmented Actor-Critic-Refiner (SGA-ACR), a framework that integrates an environment-specific subgoal graph and structured entity knowledge with a multi-LLM planning pipeline that explicitly separates generation, critique, and refinement to produce executable and verifiable subgoals. A subgoal tracker further monitors execution progress, provides auxiliary rewards, and adaptively updates the subgoal graph to maintain alignment between plans and actions. Experimental results on 22 diverse tasks in the open-world game "Crafter" demonstrate the effectiveness of our proposed method.

Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning

TL;DR

This work tackles the misalignment between abstract LLM plans and actionable environment behaviors in open-world RL. It introduces SGA-ACR, which offline-constructs an environment-specific subgoal graph and an entity knowledge base, then online employs a multi-LLM actor-critic-refiner planning pipeline augmented by graph-based retrieval, with a subgoal tracker providing execution feedback. Empirical results on 22 Crafter tasks show that SGA-ACR yields superior planning quality, better plan-execution alignment, and robustness across LLM scales compared to strong baselines. The contributions include a structured knowledge framework, a non-finetuning multi-LLM planning pipeline, and a bidirectional subgoal tracker, enabling efficient and reliable open-world RL with existing LLMs.

Abstract

Large language models (LLMs) offer strong high-level planning capabilities for reinforcement learning (RL) by decomposing tasks into subgoals. However, their practical utility is limited by poor planning-execution alignment, which reflects a critical gap between abstract plans and actionable, environment-compatible behaviors. This misalignment arises from two interrelated limitations: (1) LLMs often produce subgoals that are semantically plausible but infeasible or irrelevant in the target environment due to insufficient grounding in environment-specific knowledge, and (2) single-LLM planning conflates generation with self-verification, resulting in overconfident yet unreliable subgoals that frequently fail during execution. To address these challenges, we propose Subgoal Graph-Augmented Actor-Critic-Refiner (SGA-ACR), a framework that integrates an environment-specific subgoal graph and structured entity knowledge with a multi-LLM planning pipeline that explicitly separates generation, critique, and refinement to produce executable and verifiable subgoals. A subgoal tracker further monitors execution progress, provides auxiliary rewards, and adaptively updates the subgoal graph to maintain alignment between plans and actions. Experimental results on 22 diverse tasks in the open-world game "Crafter" demonstrate the effectiveness of our proposed method.

Paper Structure

This paper contains 35 sections, 13 equations, 17 figures, 4 tables, 1 algorithm.

Figures (17)

  • Figure 1: Framework of our SGA-ACR. In the offline stage, the LLM extracts structured knowledge from background information(in the subgoal graph, dashed lines and solid lines denote OR-edges and AND-edges, respectively). In the online stage, the RL agent optimizes its policy through interaction with the environment, and the multi-LLM planning module generates plans to guide exploration and decision-making. The subgoal tracker coordinates the planning module and the RL agent.
  • Figure 2: An example of Subgoal Tracker. Based on the state change, the checker identifies that the agent has achieved Place Table. The corresponding edge is then updated with a new weight, while an extra reward is assigned to the agent.
  • Figure 3: Comparison of reward and score learning curves.
  • Figure 4: Success rates of unlocking 22 different achievements in log scale (at 1M training steps).
  • Figure 5: Performance of SGA-ACR compared with LLM-guided RL Baselines across various model scales.
  • ...and 12 more figures