Language-guided Skill Learning with Temporal Variational Inference
Haotian Fu, Pratyusha Sharma, Elias Stengel-Eskin, George Konidaris, Nicolas Le Roux, Marc-Alexandre Côté, Xingdi Yuan
TL;DR
The paper tackles long-horizon RL sample inefficiency by introducing LAST, a framework that first uses a pretrained LLM to generate an initial, fine-grained segmentation of expert trajectories and language annotations, then refines these segments via temporal variational inference to discover reusable skills. It couples a variational objective with an MDL-based compression term to balance reconstruction accuracy and skill economy, enabling an online hierarchical RL pipeline where a high-level controller switches between temporally extended skills. Key contributions include (1) language-guided initial segmentation, (2) a temporal variational inference mechanism that merges short segments into reusable skills, (3) an MDL auxiliary objective that promotes compact skill libraries, and (4) demonstration of improved zero-shot transfer and faster learning on BabyAI and ALFRED. The approach yields semantically meaningful skills and strengthens long-horizon planning, offering practical impact for sample-efficient RL in complex, multimodal environments. Formally, the method optimizes a variational lower bound and an MDL-based code-length objective while deploying a SAC-based online RL policy over a learned skill library.
Abstract
We present an algorithm for skill discovery from expert demonstrations. The algorithm first utilizes Large Language Models (LLMs) to propose an initial segmentation of the trajectories. Following that, a hierarchical variational inference framework incorporates the LLM-generated segmentation information to discover reusable skills by merging trajectory segments. To further control the trade-off between compression and reusability, we introduce a novel auxiliary objective based on the Minimum Description Length principle that helps guide this skill discovery process. Our results demonstrate that agents equipped with our method are able to discover skills that help accelerate learning and outperform baseline skill learning approaches on new long-horizon tasks in BabyAI, a grid world navigation environment, as well as ALFRED, a household simulation environment.
