Table of Contents
Fetching ...

Learning Planning Abstractions from Language

Weiyu Liu, Geng Chen, Joy Hsu, Jiayuan Mao, Jiajun Wu

TL;DR

PARL tackles planning in complex, variable-object environments by learning planning-friendly abstractions from language. It uses a large-language model to extract object- and action-concepts from instructions, grounds them with demonstrations to learn a latent abstract space $\mathcal{S}'$, an abstract transition $\mathcal{T}'$, a feasibility model $f_{a'}$, and low-level policies $\pi_{a'}$. Planning is performed in the abstract space via a BFS-like search with feasibility scoring, followed by per-step refinement with low-level controllers. The approach generalizes to unseen object counts, novel verb-noun compositions, and longer horizons, and experiments in BabyAI and Kitchen-Worlds validate improved planning efficiency and generalization over baselines.

Abstract

This paper presents a framework for learning state and action abstractions in sequential decision-making domains. Our framework, planning abstraction from language (PARL), utilizes language-annotated demonstrations to automatically discover a symbolic and abstract action space and induce a latent state abstraction based on it. PARL consists of three stages: 1) recovering object-level and action concepts, 2) learning state abstractions, abstract action feasibility, and transition models, and 3) applying low-level policies for abstract actions. During inference, given the task description, PARL first makes abstract action plans using the latent transition and feasibility functions, then refines the high-level plan using low-level policies. PARL generalizes across scenarios involving novel object instances and environments, unseen concept compositions, and tasks that require longer planning horizons than settings it is trained on.

Learning Planning Abstractions from Language

TL;DR

PARL tackles planning in complex, variable-object environments by learning planning-friendly abstractions from language. It uses a large-language model to extract object- and action-concepts from instructions, grounds them with demonstrations to learn a latent abstract space , an abstract transition , a feasibility model , and low-level policies . Planning is performed in the abstract space via a BFS-like search with feasibility scoring, followed by per-step refinement with low-level controllers. The approach generalizes to unseen object counts, novel verb-noun compositions, and longer horizons, and experiments in BabyAI and Kitchen-Worlds validate improved planning efficiency and generalization over baselines.

Abstract

This paper presents a framework for learning state and action abstractions in sequential decision-making domains. Our framework, planning abstraction from language (PARL), utilizes language-annotated demonstrations to automatically discover a symbolic and abstract action space and induce a latent state abstraction based on it. PARL consists of three stages: 1) recovering object-level and action concepts, 2) learning state abstractions, abstract action feasibility, and transition models, and 3) applying low-level policies for abstract actions. During inference, given the task description, PARL first makes abstract action plans using the latent transition and feasibility functions, then refines the high-level plan using low-level policies. PARL generalizes across scenarios involving novel object instances and environments, unseen concept compositions, and tasks that require longer planning horizons than settings it is trained on.
Paper Structure (26 sections, 2 equations, 8 figures, 6 tables)

This paper contains 26 sections, 2 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: The overview of our training and testing paradigm, and different types of generalizations supported by our framework. (a) Given paired demonstration trajectories and language descriptions, our framework discovers an abstract action space and a latent state abstraction that supports planning for diverse language goals. (b) The example illustrates that our model can generalize to a new kitchen environment, generalize to a goal that requires reasoning about the geometries of the sink and grey pan, and generalize to the combination of concepts red and pan that is unseen during training.
  • Figure 2: The overall framework of PARL. PARL takes paired natural language instructions and demonstration trajectories as inputs. It recovers object-level concepts such as shapes and colors, and action concepts from the natural language. It then learns a planning-compatible model for object and action concepts. At test time, given novel instructions, it performs a combined high-level planning and low-level policy unrolling to output the next action to take.
  • Figure 3: PARL prompts a pretrained large language model (LLM) to parse instructions into symbolic formulas. Next, we extract the object-level and action concepts from the formulas.
  • Figure 4: Neural network architectures for our planning-compatible models, composed of (a) an object-level PCT encoder for extracting state abstractions and (b) an abstract transition Transformer for abstract transition and the feasibility prediction.
  • Figure 5: We evaluate our models on diverse task settings created in the BabyAI environments.
  • ...and 3 more figures