PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control
Ruijie Zheng, Ching-An Cheng, Hal Daumé, Furong Huang, Andrey Kolobov
TL;DR
PRISE reframes temporal action abstraction in continuous control as a sequence compression problem by quantizing actions into discrete codes and applying Byte Pair Encoding to discover reusable skill tokens. The method combines a BYOL-inspired latent forward-dynamics objective and a VQ-VAE–style encoder to learn stable state-action representations, followed by BPE to form a vocabulary of variable-horizon skills. A high-level policy over tokens is trained to generate sequences of primitive actions via a decoder, enabling efficient multitask imitation learning and rapid few-shot adaptation to unseen tasks, demonstrated on Metaworld and LIBERO with strong improvements over baselines. The work highlights the practical value of combining state-action quantization with NLP-inspired tokenization to reduce learning complexity and improve data efficiency in robotic manipulation.
Abstract
Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. In this work, we propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains. We introduce an approach called Primitive Sequence Encoding (PRISE) that combines continuous action quantization with BPE to learn powerful action abstractions. We empirically show that high-level skills discovered by PRISE from a multitask set of robotic manipulation demonstrations significantly boost the performance of both multitask imitation learning as well as few-shot imitation learning on unseen tasks. Our code is released at https://github.com/FrankZheng2022/PRISE.
