Table of Contents
Fetching ...

Autoregressive Meta-Actions for Unified Controllable Trajectory Generation

Jianbo Zhao, Taiyu Ban, Xiyang Wang, Qibin Zhou, Hangning Zhou, Zhihao Liu, Mu Yang, Lei Liu, Bin Li

TL;DR

This work addresses temporal misalignment in controllable trajectory generation by introducing Autoregressive Meta-Actions, which decompose high-level decisions into frame-level actions and align each frame’s trajectory with its corresponding action. A unified task formulation combines autoregressive meta-action prediction with meta-action-conditioned trajectory generation, enabling consistent frame-level control across the horizon. The model employs a foundation trajectory generator with RoPE-based attention, a dedicated meta-action prediction module with causal conditioning, and a meta-action injection module trained in a staged fashion; a Waymo-based dataset with frame-level labels supports evaluation. Empirical results show improved decision-following performance (higher mAP) and maintained trajectory quality, with ablations confirming the value of historical meta-actions and modular training for flexibility and stability.

Abstract

Controllable trajectory generation guided by high-level semantic decisions, termed meta-actions, is crucial for autonomous driving systems. A significant limitation of existing frameworks is their reliance on invariant meta-actions assigned over fixed future time intervals, causing temporal misalignment with the actual behavior trajectories. This misalignment leads to irrelevant associations between the prescribed meta-actions and the resulting trajectories, disrupting task coherence and limiting model performance. To address this challenge, we introduce Autoregressive Meta-Actions, an approach integrated into autoregressive trajectory generation frameworks that provides a unified and precise definition for meta-action-conditioned trajectory prediction. Specifically, We decompose traditional long-interval meta-actions into frame-level meta-actions, enabling a sequential interplay between autoregressive meta-action prediction and meta-action-conditioned trajectory generation. This decomposition ensures strict alignment between each trajectory segment and its corresponding meta-action, achieving a consistent and unified task formulation across the entire trajectory span and significantly reducing complexity. Moreover, we propose a staged pre-training process to decouple the learning of basic motion dynamics from the integration of high-level decision control, which offers flexibility, stability, and modularity. Experimental results validate our framework's effectiveness, demonstrating improved trajectory adaptivity and responsiveness to dynamic decision-making scenarios. We provide the video document and dataset, which are available at https://arma-traj.github.io/.

Autoregressive Meta-Actions for Unified Controllable Trajectory Generation

TL;DR

This work addresses temporal misalignment in controllable trajectory generation by introducing Autoregressive Meta-Actions, which decompose high-level decisions into frame-level actions and align each frame’s trajectory with its corresponding action. A unified task formulation combines autoregressive meta-action prediction with meta-action-conditioned trajectory generation, enabling consistent frame-level control across the horizon. The model employs a foundation trajectory generator with RoPE-based attention, a dedicated meta-action prediction module with causal conditioning, and a meta-action injection module trained in a staged fashion; a Waymo-based dataset with frame-level labels supports evaluation. Empirical results show improved decision-following performance (higher mAP) and maintained trajectory quality, with ablations confirming the value of historical meta-actions and modular training for flexibility and stability.

Abstract

Controllable trajectory generation guided by high-level semantic decisions, termed meta-actions, is crucial for autonomous driving systems. A significant limitation of existing frameworks is their reliance on invariant meta-actions assigned over fixed future time intervals, causing temporal misalignment with the actual behavior trajectories. This misalignment leads to irrelevant associations between the prescribed meta-actions and the resulting trajectories, disrupting task coherence and limiting model performance. To address this challenge, we introduce Autoregressive Meta-Actions, an approach integrated into autoregressive trajectory generation frameworks that provides a unified and precise definition for meta-action-conditioned trajectory prediction. Specifically, We decompose traditional long-interval meta-actions into frame-level meta-actions, enabling a sequential interplay between autoregressive meta-action prediction and meta-action-conditioned trajectory generation. This decomposition ensures strict alignment between each trajectory segment and its corresponding meta-action, achieving a consistent and unified task formulation across the entire trajectory span and significantly reducing complexity. Moreover, we propose a staged pre-training process to decouple the learning of basic motion dynamics from the integration of high-level decision control, which offers flexibility, stability, and modularity. Experimental results validate our framework's effectiveness, demonstrating improved trajectory adaptivity and responsiveness to dynamic decision-making scenarios. We provide the video document and dataset, which are available at https://arma-traj.github.io/.

Paper Structure

This paper contains 50 sections, 2 theorems, 37 equations, 9 figures, 1 table.

Key Result

Proposition 3.1

The regression-based task formulation defined in Equation eq:regression_condition_task is not unified with respect to mapping input histories and meta-actions to future trajectories.

Figures (9)

  • Figure 1: Example of different meta-actions in a sliding window with interval $T>1$.
  • Figure 2: The overall diagram of the proposed model architecture.
  • Figure 3: Visual comparison of various models for following decision "Left Lane Change".
  • Figure 4: Frame-by-frame visualization of predicted meta-actions and resulting ego trajectories. Colored dots represent the ego vehicle’s state and corresponding meta-action at each timestep. Different subplots illustrate behaviors under autoregressively sampled meta-actions or specific injected meta-actions such as Left Lane Change and Turn Left.
  • Figure 5: Illustration of the temporal stages of a lane change maneuver. The process is divided into three phases: initiation (the vehicle begins lateral deviation), mid-transition (the vehicle straddles the lane boundary), and completion (the vehicle stabilizes within the target lane). Accurate identification of these stages is essential for consistent meta-action prediction and interpretable decision modeling.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Definition 1: Task Unification
  • Example 1
  • Proposition 3.1
  • Proposition 3.2