OAT: Ordered Action Tokenization
Chaoqi Liu, Xiaoshen Han, Jiawei Gao, Yue Zhao, Haonan Chen, Yilun Du
TL;DR
The paper tackles how to discretize continuous robot actions for autoregressive policies by formalizing three core desiderata: high compression, total decodability, and left-to-right causal ordering. It introduces Ordered Action Tokenization (OAT), a tokenizer that uses transformer-based register tokens, finite scalar quantization, and nested dropout to create an ordered, prefix-decodable token space that aligns with next-token prediction. Empirically, OAT outperforms naive binning, FAST, and latent-tokenizers across 20+ simulation and real-world tasks, offering a flexible anytime decoding capability that trades computation for action fidelity. The work demonstrates that token space ordering is a crucial inductive bias for stable, scalable autoregressive learning and suggests OAT as a versatile component for future robot learning pipelines and VLAs.
Abstract
Autoregressive policies offer a compelling foundation for scalable robot learning by enabling discrete abstraction, token-level reasoning, and flexible inference. However, applying autoregressive modeling to continuous robot actions requires an effective action tokenization scheme. Existing approaches either rely on analytical discretization methods that produce prohibitively long token sequences, or learned latent tokenizers that lack structure, limiting their compatibility with next-token prediction. In this work, we identify three desiderata for action tokenization - high compression, total decodability, and a left-to-right causally ordered token space - and introduce Ordered Action Tokenization (OAT), a learned action tokenizer that satisfies all three. OAT discretizes action chunks into an ordered sequence of tokens using transformer with registers, finite scalar quantization, and ordering-inducing training mechanisms. The resulting token space aligns naturally with autoregressive generation and enables prefix-based detokenization, yielding an anytime trade-off between inference cost and action fidelity. Across more than 20 tasks spanning four simulation benchmarks and real-world settings, autoregressive policies equipped with OAT consistently outperform prior tokenization schemes and diffusion-based baselines, while offering significantly greater flexibility at inference time.
