Table of Contents
Fetching ...

SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments

Haoye Lu, Pavan Seshadri, Kaheer Suleman

TL;DR

This work tackles long-horizon planning in text-based environments by reducing reliance on expensive, continuously queried LLM planners. It introduces SCOPE, a one-shot hierarchical planner that uses LLM-generated subgoals at initialization to pretrain a lightweight employee-manager system, followed by RL fine-tuning with world models. Empirical results on TextCraft show SCOPE surpasses a purely LLM-driven baseline in success rate while dramatically reducing inference time, validating the practicality of one-time guidance. The findings emphasize that well-aligned, subgoal-based decomposition can effectively guide hierarchical planning with substantial efficiency gains and robust performance even when subgoals are imperfect.

Abstract

Long-term planning in complex, text-based environments presents significant challenges due to open-ended action spaces, ambiguous observations, and sparse feedback. Recent research suggests that large language models (LLMs) encode rich semantic knowledge about the world, which can be valuable for guiding agents in high-level reasoning and planning across both embodied and purely textual settings. However, existing approaches often depend heavily on querying LLMs during training and inference, making them computationally expensive and difficult to deploy efficiently. In addition, these methods typically employ a pretrained, unaltered LLM whose parameters remain fixed throughout training, providing no opportunity for adaptation to the target task. To address these limitations, we introduce SCOPE (Subgoal-COnditioned Pretraining for Efficient planning), a one-shot hierarchical planner that leverages LLM-generated subgoals only at initialization to pretrain a lightweight student model. Unlike prior approaches that distill LLM knowledge by repeatedly prompting the model to adaptively generate subgoals during training, our method derives subgoals directly from example trajectories. This design removes the need for repeated LLM queries, significantly improving efficiency, though at the cost of reduced explainability and potentially suboptimal subgoals. Despite their suboptimality, our results on the TextCraft environment show that LLM-generated subgoals can still serve as a strong starting point for hierarchical goal decomposition in text-based planning tasks. Compared to the LLM-based hierarchical agent ADaPT (Prasad et al., 2024), which achieves a 0.52 success rate, our method reaches 0.56 and reduces inference time from 164.4 seconds to just 3.0 seconds.

SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments

TL;DR

This work tackles long-horizon planning in text-based environments by reducing reliance on expensive, continuously queried LLM planners. It introduces SCOPE, a one-shot hierarchical planner that uses LLM-generated subgoals at initialization to pretrain a lightweight employee-manager system, followed by RL fine-tuning with world models. Empirical results on TextCraft show SCOPE surpasses a purely LLM-driven baseline in success rate while dramatically reducing inference time, validating the practicality of one-time guidance. The findings emphasize that well-aligned, subgoal-based decomposition can effectively guide hierarchical planning with substantial efficiency gains and robust performance even when subgoals are imperfect.

Abstract

Long-term planning in complex, text-based environments presents significant challenges due to open-ended action spaces, ambiguous observations, and sparse feedback. Recent research suggests that large language models (LLMs) encode rich semantic knowledge about the world, which can be valuable for guiding agents in high-level reasoning and planning across both embodied and purely textual settings. However, existing approaches often depend heavily on querying LLMs during training and inference, making them computationally expensive and difficult to deploy efficiently. In addition, these methods typically employ a pretrained, unaltered LLM whose parameters remain fixed throughout training, providing no opportunity for adaptation to the target task. To address these limitations, we introduce SCOPE (Subgoal-COnditioned Pretraining for Efficient planning), a one-shot hierarchical planner that leverages LLM-generated subgoals only at initialization to pretrain a lightweight student model. Unlike prior approaches that distill LLM knowledge by repeatedly prompting the model to adaptively generate subgoals during training, our method derives subgoals directly from example trajectories. This design removes the need for repeated LLM queries, significantly improving efficiency, though at the cost of reduced explainability and potentially suboptimal subgoals. Despite their suboptimality, our results on the TextCraft environment show that LLM-generated subgoals can still serve as a strong starting point for hierarchical goal decomposition in text-based planning tasks. Compared to the LLM-based hierarchical agent ADaPT (Prasad et al., 2024), which achieves a 0.52 success rate, our method reaches 0.56 and reduces inference time from 164.4 seconds to just 3.0 seconds.

Paper Structure

This paper contains 27 sections, 10 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: TextCraft Environment. An example crafting dependency chain for producing the ace item (lime stained glass pane). The agent must gather base items, synthesize intermediate items through the provided crafting commands, and execute the sequence in the correct order to obtain the final reward.
  • Figure 2: The RL training pipeline of the employee agent.
  • Figure 3: Comparison of LLM-generated and hand-engineered subgoal decompositions for the demonstration trajectory shown above. Additional samples are provided in \ref{['appx:llmgenVSEngSubgoal']}.
  • Figure 4: Validation trajectory success rate for the manager agent during RL fine-tuning (Hand-engineered-subgoal). The manager progressively adapts its subgoal proposals to compensate for employee imperfections, yielding steadily improving performance.
  • Figure 5: Subgoal vs. ultimate goal success rate in SCOPE.
  • ...and 2 more figures