Table of Contents
Fetching ...

PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control

Ruijie Zheng, Ching-An Cheng, Hal Daumé, Furong Huang, Andrey Kolobov

TL;DR

PRISE reframes temporal action abstraction in continuous control as a sequence compression problem by quantizing actions into discrete codes and applying Byte Pair Encoding to discover reusable skill tokens. The method combines a BYOL-inspired latent forward-dynamics objective and a VQ-VAE–style encoder to learn stable state-action representations, followed by BPE to form a vocabulary of variable-horizon skills. A high-level policy over tokens is trained to generate sequences of primitive actions via a decoder, enabling efficient multitask imitation learning and rapid few-shot adaptation to unseen tasks, demonstrated on Metaworld and LIBERO with strong improvements over baselines. The work highlights the practical value of combining state-action quantization with NLP-inspired tokenization to reduce learning complexity and improve data efficiency in robotic manipulation.

Abstract

Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. In this work, we propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains. We introduce an approach called Primitive Sequence Encoding (PRISE) that combines continuous action quantization with BPE to learn powerful action abstractions. We empirically show that high-level skills discovered by PRISE from a multitask set of robotic manipulation demonstrations significantly boost the performance of both multitask imitation learning as well as few-shot imitation learning on unseen tasks. Our code is released at https://github.com/FrankZheng2022/PRISE.

PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control

TL;DR

PRISE reframes temporal action abstraction in continuous control as a sequence compression problem by quantizing actions into discrete codes and applying Byte Pair Encoding to discover reusable skill tokens. The method combines a BYOL-inspired latent forward-dynamics objective and a VQ-VAE–style encoder to learn stable state-action representations, followed by BPE to form a vocabulary of variable-horizon skills. A high-level policy over tokens is trained to generate sequences of primitive actions via a decoder, enabling efficient multitask imitation learning and rapid few-shot adaptation to unseen tasks, demonstrated on Metaworld and LIBERO with strong improvements over baselines. The work highlights the practical value of combining state-action quantization with NLP-inspired tokenization to reduce learning complexity and improve data efficiency in robotic manipulation.

Abstract

Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. In this work, we propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains. We introduce an approach called Primitive Sequence Encoding (PRISE) that combines continuous action quantization with BPE to learn powerful action abstractions. We empirically show that high-level skills discovered by PRISE from a multitask set of robotic manipulation demonstrations significantly boost the performance of both multitask imitation learning as well as few-shot imitation learning on unseen tasks. Our code is released at https://github.com/FrankZheng2022/PRISE.
Paper Structure (19 sections, 6 equations, 15 figures, 4 tables)

This paper contains 19 sections, 6 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: (a) Pretraining Stage I of PRISE : The goal is to learn a action quantization module such that conditioned on the current state and action $(o_t,a_t)$, it could assign a discrete action code. (b) Pretraining Stage II of PRISE : First it converts a trajectory of continuous state and actions into discrete codes. Then based on the corpus of quantized trajectories from the multitask offline dataset, PRISE applies BPE (illustrated in \ref{['fig:BPE']}) to learn vocabulary of skill tokens, where each token represents a sequence of discrete action code.
  • Figure 2: Byte Pair Encoding.
  • Figure 3: During evaluation time, PRISE rollout its policy by querying the skill-token policy $\pi$ for the skill token and then using pretrained decoder $\psi$ to decode actions.
  • Figure 4: PRISE tokenizes downstream demonstration trajectories by greedily searching for the longest token for each time step.
  • Figure 5: Few-shot IL on unseen tasks: pretrain and test tasks split for MetaWorld and LIBERO.
  • ...and 10 more figures