Learning with Language-Guided State Abstractions
Andi Peng, Ilia Sucholutsky, Belinda Z. Li, Theodore R. Sumers, Thomas L. Griffiths, Jacob Andreas, Julie A. Shah
TL;DR
The paper introduces Language-Guided Abstraction (LGA), a framework that uses natural language descriptions and language models to automatically construct task-relevant state abstractions for imitation learning. By transforming raw perceptual inputs into a text-based feature set, selecting relevant features with an LM, and instantiating a compact abstract state, LGA enables an abstraction-conditioned policy to learn over simplified representations. Empirical results in the VIMA environment show that LGA abstractions are on par with human-designed ones in effectiveness, while substantially reducing human effort, and that policies trained with LGA abstractions generalize robustly to covariate shifts and linguistic ambiguities, including zero-shot generalization to unseen commands. Real-world Spot robot experiments further demonstrate LGA's practical impact for robust, sample-efficient mobile manipulation with distractors and ambiguous goals.
Abstract
We describe a framework for using natural language to design state abstractions for imitation learning. Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations, which can surface important features of an environment and hide irrelevant ones. These state representations are typically manually specified, or derived from other labor-intensive labeling procedures. Our method, LGA (language-guided abstraction), uses a combination of natural language supervision and background knowledge from language models (LMs) to automatically build state representations tailored to unseen tasks. In LGA, a user first provides a (possibly incomplete) description of a target task in natural language; next, a pre-trained LM translates this task description into a state abstraction function that masks out irrelevant features; finally, an imitation policy is trained using a small number of demonstrations and LGA-generated abstract states. Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time, and that these abstractions improve generalization and robustness in the presence of spurious correlations and ambiguous specifications. We illustrate the utility of the learned abstractions on mobile manipulation tasks with a Spot robot.
