Table of Contents
Fetching ...

Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums

Beihao Xia, Conghao Wong, Duanquan Xu, Qinmu Peng, Xinge You

TL;DR

This work tackles forecasting heterogeneous trajectories that come in diverse representations by reframing trajectories as time-frequency spectrums. It introduces V$^{2}$-Net and its enhanced version E-V$^{2}$-Net, which leverage transforms (DFT and Haar) to capture per-dimension dynamics and propose a bilinear fusion to model dimension-wise interactions across trajectory dimensions. The approach enables hierarchical prediction across frequency scales and across trajectory forms, with Transformer-based encoders/decoders to integrate spectrum representations and interactions. Empirically, E-V$^{2}$-Net variants achieve strong or state-of-the-art performance on ETH-UCY, SDD, nuScenes, and Human3.6M across 2D coordinates, bounding boxes, and 3D skeletons, while analyses highlight transform-specific trade-offs and the value of dimension-wise interactions for high-dimensional trajectories.

Abstract

With the fast development of AI-related techniques, the applications of trajectory prediction are no longer limited to easier scenes and trajectories. More and more trajectories with different forms, such as coordinates, bounding boxes, and even high-dimensional human skeletons, need to be analyzed and forecasted. Among these heterogeneous trajectories, interactions between different elements within a frame of trajectory, which we call ``Dimension-wise Interactions'', would be more complex and challenging. However, most previous approaches focus mainly on a specific form of trajectories, and potential dimension-wise interactions are less concerned. In this work, we expand the trajectory prediction task by introducing the trajectory dimensionality $M$, thus extending its application scenarios to heterogeneous trajectories. We first introduce the Haar transform as an alternative to Fourier transform to better capture the time-frequency properties of each trajectory-dimension. Then, we adopt the bilinear structure to model and fuse two factors simultaneously, including the time-frequency response and the dimension-wise interaction, to forecast heterogeneous trajectories via trajectory spectrums hierarchically in a generic way. Experiments show that the proposed model outperforms most state-of-the-art methods on ETH-UCY, SDD, nuScenes, and Human3.6M with heterogeneous trajectories, including 2D coordinates, 2D/3D bounding boxes, and 3D human skeletons.

Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums

TL;DR

This work tackles forecasting heterogeneous trajectories that come in diverse representations by reframing trajectories as time-frequency spectrums. It introduces V-Net and its enhanced version E-V-Net, which leverage transforms (DFT and Haar) to capture per-dimension dynamics and propose a bilinear fusion to model dimension-wise interactions across trajectory dimensions. The approach enables hierarchical prediction across frequency scales and across trajectory forms, with Transformer-based encoders/decoders to integrate spectrum representations and interactions. Empirically, E-V-Net variants achieve strong or state-of-the-art performance on ETH-UCY, SDD, nuScenes, and Human3.6M across 2D coordinates, bounding boxes, and 3D skeletons, while analyses highlight transform-specific trade-offs and the value of dimension-wise interactions for high-dimensional trajectories.

Abstract

With the fast development of AI-related techniques, the applications of trajectory prediction are no longer limited to easier scenes and trajectories. More and more trajectories with different forms, such as coordinates, bounding boxes, and even high-dimensional human skeletons, need to be analyzed and forecasted. Among these heterogeneous trajectories, interactions between different elements within a frame of trajectory, which we call ``Dimension-wise Interactions'', would be more complex and challenging. However, most previous approaches focus mainly on a specific form of trajectories, and potential dimension-wise interactions are less concerned. In this work, we expand the trajectory prediction task by introducing the trajectory dimensionality , thus extending its application scenarios to heterogeneous trajectories. We first introduce the Haar transform as an alternative to Fourier transform to better capture the time-frequency properties of each trajectory-dimension. Then, we adopt the bilinear structure to model and fuse two factors simultaneously, including the time-frequency response and the dimension-wise interaction, to forecast heterogeneous trajectories via trajectory spectrums hierarchically in a generic way. Experiments show that the proposed model outperforms most state-of-the-art methods on ETH-UCY, SDD, nuScenes, and Human3.6M with heterogeneous trajectories, including 2D coordinates, 2D/3D bounding boxes, and 3D human skeletons.
Paper Structure (17 sections, 48 equations, 15 figures, 17 tables)

This paper contains 17 sections, 48 equations, 15 figures, 17 tables.

Figures (15)

  • Figure 1: Examples of several trajectory forms. Different forms of trajectories may exist in complex scenarios regardless of agent categories. Trajectories are no longer limited to 2D coordinate series.
  • Figure 2: Illustrations of challenges in heterogeneous trajectory prediction, i.e., "Time-Frequency Response" and "Dimension-wise Interaction".
  • Figure 3: V$^{2}$-Net Overview. It has keypoints estimation and spectrum interpolation two sub-networks. It forecasts 2D coordinate trajectories "from-coarse-to-fine" hierarchically via Fourier spectrums.
  • Figure 4: Bilinear Structure in E-V$^{2}$-Net. It takes spectrums $\mathbf{\mathcal{S}} \in \mathbb{R}^{\mathcal{N}_h \times \mathcal{M}}$ as the input, and finally outputs the refined spectrum features $\mathbf{f}_R \in \mathbb{R}^{\mathcal{N}_h \times 64}$.
  • Figure 5: The average energy percentages of the observed trajectories on different moments and different frequency components.
  • ...and 10 more figures