Learning Planning Abstractions from Language
Weiyu Liu, Geng Chen, Joy Hsu, Jiayuan Mao, Jiajun Wu
TL;DR
PARL tackles planning in complex, variable-object environments by learning planning-friendly abstractions from language. It uses a large-language model to extract object- and action-concepts from instructions, grounds them with demonstrations to learn a latent abstract space $\mathcal{S}'$, an abstract transition $\mathcal{T}'$, a feasibility model $f_{a'}$, and low-level policies $\pi_{a'}$. Planning is performed in the abstract space via a BFS-like search with feasibility scoring, followed by per-step refinement with low-level controllers. The approach generalizes to unseen object counts, novel verb-noun compositions, and longer horizons, and experiments in BabyAI and Kitchen-Worlds validate improved planning efficiency and generalization over baselines.
Abstract
This paper presents a framework for learning state and action abstractions in sequential decision-making domains. Our framework, planning abstraction from language (PARL), utilizes language-annotated demonstrations to automatically discover a symbolic and abstract action space and induce a latent state abstraction based on it. PARL consists of three stages: 1) recovering object-level and action concepts, 2) learning state abstractions, abstract action feasibility, and transition models, and 3) applying low-level policies for abstract actions. During inference, given the task description, PARL first makes abstract action plans using the latent transition and feasibility functions, then refines the high-level plan using low-level policies. PARL generalizes across scenarios involving novel object instances and environments, unseen concept compositions, and tasks that require longer planning horizons than settings it is trained on.
