Table of Contents
Fetching ...

Learning adaptive planning representations with natural language guidance

Lionel Wong, Jiayuan Mao, Pratyusha Sharma, Zachary S. Siegel, Jiahai Feng, Noa Korneev, Joshua B. Tenenbaum, Jacob Andreas

TL;DR

The paper tackles long-horizon planning by leveraging language models to bootstrap a library of task-specific, hierarchical planning operators. Ada iteratively proposes, verifies, and grounds high-level abstractions with low-level controllers, guided by interactive planning and LM-driven goal proposals. On Mini Minecraft and ALFRED benchmarks, Ada achieves superior planning accuracy and generalization by learning compact operator libraries that align with ground-truth skills, outperforming baselines that rely solely on LM-predicted goals or code policies. This approach promises scalable, LM-guided grounding of hierarchical planning representations with practical impact for complex, real-world tasks.

Abstract

Effective planning in the real world requires not only world knowledge, but the ability to leverage that knowledge to build the right representation of the task at hand. Decades of hierarchical planning techniques have used domain-specific temporal action abstractions to support efficient and accurate planning, almost always relying on human priors and domain knowledge to decompose hard tasks into smaller subproblems appropriate for a goal or set of goals. This paper describes Ada (Action Domain Acquisition), a framework for automatically constructing task-specific planning representations using task-general background knowledge from language models (LMs). Starting with a general-purpose hierarchical planner and a low-level goal-conditioned policy, Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks. On two language-guided interactive planning benchmarks (Mini Minecraft and ALFRED Household Tasks), Ada strongly outperforms other approaches that use LMs for sequential decision-making, offering more accurate plans and better generalization to complex tasks.

Learning adaptive planning representations with natural language guidance

TL;DR

The paper tackles long-horizon planning by leveraging language models to bootstrap a library of task-specific, hierarchical planning operators. Ada iteratively proposes, verifies, and grounds high-level abstractions with low-level controllers, guided by interactive planning and LM-driven goal proposals. On Mini Minecraft and ALFRED benchmarks, Ada achieves superior planning accuracy and generalization by learning compact operator libraries that align with ground-truth skills, outperforming baselines that rely solely on LM-predicted goals or code policies. This approach promises scalable, LM-guided grounding of hierarchical planning representations with practical impact for complex, real-world tasks.

Abstract

Effective planning in the real world requires not only world knowledge, but the ability to leverage that knowledge to build the right representation of the task at hand. Decades of hierarchical planning techniques have used domain-specific temporal action abstractions to support efficient and accurate planning, almost always relying on human priors and domain knowledge to decompose hard tasks into smaller subproblems appropriate for a goal or set of goals. This paper describes Ada (Action Domain Acquisition), a framework for automatically constructing task-specific planning representations using task-general background knowledge from language models (LMs). Starting with a general-purpose hierarchical planner and a low-level goal-conditioned policy, Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks. On two language-guided interactive planning benchmarks (Mini Minecraft and ALFRED Household Tasks), Ada strongly outperforms other approaches that use LMs for sequential decision-making, offering more accurate plans and better generalization to complex tasks.
Paper Structure (16 sections, 4 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 4 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Our approach solves complex planning tasks specified in language and grounded in interactive environments by jointly learning a library of symbolic high-level action abstractions and modular low-level controllers associated with each abstraction. Our system leverages background information in language as a prior to propose useful action abstractions, then uses a hierarchical planning framework to verify and ground them.
  • Figure 2: Representation for our (a) task input, (b) the bi-level planning and execution pipeline for inference time, and (c) the abstract state and action representation.
  • Figure 3: The overall framework. Given task environment states and descriptions, at each iteration, we first propose candidate abstract actions (operators) ${\mathcal{A}}_i'$, then uses bi-level planning and execution to solve tasks. We add operators to the operator library based on the execution result.
  • Figure 4: Our two-stage prompting method for generating candidate operator definitions. (a) Given a task instruction, we first prompt an LLM to generate a candidate symbolic task decomposition. (b) We then extract undefined operator names that appear in these action sequences and prompt an LLM to generate symbolic definitions.
  • Figure 5: Top: (a) Visualization of the Mini Minecraft environment, showing an intermediate step towards crafting a bed. (b) Operator proposed by an LLM and verified by our algorithm through planning and execution. (c) Visualization of the low-level crafting tools involved in crafting the bed. Bottom: (a) Visualization of the ALFRED household environment. (b) Example operators proposed by LLM and verified by our algorithm, which are composed to solve the cold potato slice task.