SEAL: SEmantic-Augmented Imitation Learning via Language Model
Chengyang Gu, Yuxin Pan, Haotian Bai, Hui Xiong, Yize Chen
TL;DR
SEAL addresses long-horizon hierarchical imitation learning by using pretrained LLMs to define semantically meaningful sub-goals and pre-label states, eliminating the need for task-specific hierarchies. It introduces a dual-encoder Sub-goal Learner (LLM-supervised and unsupervised VQ) and a transition-augmented low-level policy to emphasize critical sub-goal transitions during imitation. End-to-end training optimizes a weighted combination of high- and low-level objectives, with dynamic confidences guiding the integration of the encoders. Empirical results on KeyDoor and Grid-World show SEAL outperforming BC, LISA, SDIL, and Thought Cloning, especially in low-data and longer-horizon scenarios, demonstrating robust sub-goal discovery, better transition handling, and improved generalization to task variations.
Abstract
Hierarchical Imitation Learning (HIL) is a promising approach for tackling long-horizon decision-making tasks. While it is a challenging task due to the lack of detailed supervisory labels for sub-goal learning, and reliance on hundreds to thousands of expert demonstrations. In this work, we introduce SEAL, a novel framework that leverages Large Language Models (LLMs)'s powerful semantic and world knowledge for both specifying sub-goal space and pre-labeling states to semantically meaningful sub-goal representations without prior knowledge of task hierarchies. SEAL employs a dual-encoder structure, combining supervised LLM-guided sub-goal learning with unsupervised Vector Quantization (VQ) for more robust sub-goal representations. Additionally, SEAL incorporates a transition-augmented low-level planner for improved adaptation to sub-goal transitions. Our experiments demonstrate that SEAL outperforms state-of-the-art HIL methods and LLM-based planning approaches, particularly in settings with small expert datasets and complex long-horizon tasks.
