Table of Contents
Fetching ...

DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling

Jianbo Zhao, Taiyu Ban, Zhihao Liu, Hangning Zhou, Xiyang Wang, Qibin Zhou, Hailong Qin, Mu Yang, Lei Liu, Bin Li

TL;DR

DRoPE addresses efficient inter-agent interaction modeling for autonomous driving trajectory generation by introducing a directional extension of Rotary Position Embedding (RoPE) that encodes relative angular information. By unifying angular scalars into a global embedding, DRoPE enables angular-aware attention (DRoPE-RoPE) with space complexity $O(N)$ and preserved transformer-style time complexity, avoiding the $O(N^2)$ space blows of explicit RPE. Theoretical analysis demonstrates correctness in modeling periodic angles, and empirical results on Waymo Motion show state-of-the-art minADE with reduced memory and FLOPs compared to RPE-based approaches. This approach yields scalable, accurate trajectory generation suitable for real-time deployment and offers a path to applying efficient periodic angle modeling in other domains.

Abstract

Accurate and efficient modeling of agent interactions is essential for trajectory generation, the core of autonomous driving systems. Existing methods, scene-centric, agent-centric, and query-centric frameworks, each present distinct advantages and drawbacks, creating an impossible triangle among accuracy, computational time, and memory efficiency. To break this limitation, we propose Directional Rotary Position Embedding (DRoPE), a novel adaptation of Rotary Position Embedding (RoPE), originally developed in natural language processing. Unlike traditional relative position embedding (RPE), which introduces significant space complexity, RoPE efficiently encodes relative positions without explicitly increasing complexity but faces inherent limitations in handling angular information due to periodicity. DRoPE overcomes this limitation by introducing a uniform identity scalar into RoPE's 2D rotary transformation, aligning rotation angles with realistic agent headings to naturally encode relative angular information. We theoretically analyze DRoPE's correctness and efficiency, demonstrating its capability to simultaneously optimize trajectory generation accuracy, time complexity, and space complexity. Empirical evaluations compared with various state-of-the-art trajectory generation models, confirm DRoPE's good performance and significantly reduced space complexity, indicating both theoretical soundness and practical effectiveness. The video documentation is available at https://drope-traj.github.io/.

DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling

TL;DR

DRoPE addresses efficient inter-agent interaction modeling for autonomous driving trajectory generation by introducing a directional extension of Rotary Position Embedding (RoPE) that encodes relative angular information. By unifying angular scalars into a global embedding, DRoPE enables angular-aware attention (DRoPE-RoPE) with space complexity and preserved transformer-style time complexity, avoiding the space blows of explicit RPE. Theoretical analysis demonstrates correctness in modeling periodic angles, and empirical results on Waymo Motion show state-of-the-art minADE with reduced memory and FLOPs compared to RPE-based approaches. This approach yields scalable, accurate trajectory generation suitable for real-time deployment and offers a path to applying efficient periodic angle modeling in other domains.

Abstract

Accurate and efficient modeling of agent interactions is essential for trajectory generation, the core of autonomous driving systems. Existing methods, scene-centric, agent-centric, and query-centric frameworks, each present distinct advantages and drawbacks, creating an impossible triangle among accuracy, computational time, and memory efficiency. To break this limitation, we propose Directional Rotary Position Embedding (DRoPE), a novel adaptation of Rotary Position Embedding (RoPE), originally developed in natural language processing. Unlike traditional relative position embedding (RPE), which introduces significant space complexity, RoPE efficiently encodes relative positions without explicitly increasing complexity but faces inherent limitations in handling angular information due to periodicity. DRoPE overcomes this limitation by introducing a uniform identity scalar into RoPE's 2D rotary transformation, aligning rotation angles with realistic agent headings to naturally encode relative angular information. We theoretically analyze DRoPE's correctness and efficiency, demonstrating its capability to simultaneously optimize trajectory generation accuracy, time complexity, and space complexity. Empirical evaluations compared with various state-of-the-art trajectory generation models, confirm DRoPE's good performance and significantly reduced space complexity, indicating both theoretical soundness and practical effectiveness. The video documentation is available at https://drope-traj.github.io/.

Paper Structure

This paper contains 21 sections, 5 theorems, 29 equations, 5 figures, 2 tables.

Key Result

Proposition 1

The space complexity for multi-head attention inputs under parallel computation is $\mathcal{O}(NH(2d_k+d_v))$.

Figures (5)

  • Figure 1: The impossible triangular of current trajectory generation methods.
  • Figure 2: RPE v.s. RoPE in terms of space complexity.
  • Figure 3: The infeasibility of RoPE in handling the periodicity of angles.
  • Figure 4: Comparison of two integration methods for DRoPE and RoPE.
  • Figure 5: Comparison of training memory, evaluation memory, and FLOPs across different scene representation approaches.

Theorems & Definitions (15)

  • Proposition 1
  • proof
  • Definition 1: RPE
  • Definition 2: General RPE functions
  • Remark 1
  • Proposition 2
  • proof
  • Definition 3: RoPE
  • Corollary 1
  • proof
  • ...and 5 more