Autoregressive Meta-Actions for Unified Controllable Trajectory Generation
Jianbo Zhao, Taiyu Ban, Xiyang Wang, Qibin Zhou, Hangning Zhou, Zhihao Liu, Mu Yang, Lei Liu, Bin Li
TL;DR
This work addresses temporal misalignment in controllable trajectory generation by introducing Autoregressive Meta-Actions, which decompose high-level decisions into frame-level actions and align each frame’s trajectory with its corresponding action. A unified task formulation combines autoregressive meta-action prediction with meta-action-conditioned trajectory generation, enabling consistent frame-level control across the horizon. The model employs a foundation trajectory generator with RoPE-based attention, a dedicated meta-action prediction module with causal conditioning, and a meta-action injection module trained in a staged fashion; a Waymo-based dataset with frame-level labels supports evaluation. Empirical results show improved decision-following performance (higher mAP) and maintained trajectory quality, with ablations confirming the value of historical meta-actions and modular training for flexibility and stability.
Abstract
Controllable trajectory generation guided by high-level semantic decisions, termed meta-actions, is crucial for autonomous driving systems. A significant limitation of existing frameworks is their reliance on invariant meta-actions assigned over fixed future time intervals, causing temporal misalignment with the actual behavior trajectories. This misalignment leads to irrelevant associations between the prescribed meta-actions and the resulting trajectories, disrupting task coherence and limiting model performance. To address this challenge, we introduce Autoregressive Meta-Actions, an approach integrated into autoregressive trajectory generation frameworks that provides a unified and precise definition for meta-action-conditioned trajectory prediction. Specifically, We decompose traditional long-interval meta-actions into frame-level meta-actions, enabling a sequential interplay between autoregressive meta-action prediction and meta-action-conditioned trajectory generation. This decomposition ensures strict alignment between each trajectory segment and its corresponding meta-action, achieving a consistent and unified task formulation across the entire trajectory span and significantly reducing complexity. Moreover, we propose a staged pre-training process to decouple the learning of basic motion dynamics from the integration of high-level decision control, which offers flexibility, stability, and modularity. Experimental results validate our framework's effectiveness, demonstrating improved trajectory adaptivity and responsiveness to dynamic decision-making scenarios. We provide the video document and dataset, which are available at https://arma-traj.github.io/.
