Table of Contents
Fetching ...

MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction

Seongju Lee, Junseok Lee, Yeonguk Yu, Taeri Kim, Kyoobin Lee

TL;DR

MART introduces a MultiscAle Relational Transformer for multi-agent trajectory prediction, integrating a Pair-wise Relational Transformer and a Hyper Relational Transformer within a MARTE encoder to capture both individual and group interactions. The Adaptive Group Estimator infers overlapping group relations with a learnable threshold, enabling flexible group reasoning without predefined topologies. Across NBA, SDD, and ETH-UCY, MART achieves state-of-the-art or competitive results, notably reducing ADE by 3.9% and FDE by 11.8% on NBA compared with EqMotion, while requiring substantially fewer parameters and MAC operations than prior SOTA models. The approach advances trajectory forecasting by enabling robust group-aware attention, with potential extensions to incorporate temporal context and scene information for enhanced real-world planning.

Abstract

Multi-agent trajectory prediction is crucial to autonomous driving and understanding the surrounding environment. Learning-based approaches for multi-agent trajectory prediction, such as primarily relying on graph neural networks, graph transformers, and hypergraph neural networks, have demonstrated outstanding performance on real-world datasets in recent years. However, the hypergraph transformer-based method for trajectory prediction is yet to be explored. Therefore, we present a MultiscAle Relational Transformer (MART) network for multi-agent trajectory prediction. MART is a hypergraph transformer architecture to consider individual and group behaviors in transformer machinery. The core module of MART is the encoder, which comprises a Pair-wise Relational Transformer (PRT) and a Hyper Relational Transformer (HRT). The encoder extends the capabilities of a relational transformer by introducing HRT, which integrates hyperedge features into the transformer mechanism, promoting attention weights to focus on group-wise relations. In addition, we propose an Adaptive Group Estimator (AGE) designed to infer complex group relations in real-world environments. Extensive experiments on three real-world datasets (NBA, SDD, and ETH-UCY) demonstrate that our method achieves state-of-the-art performance, enhancing ADE/FDE by 3.9%/11.8% on the NBA dataset. Code is available at https://github.com/gist-ailab/MART.

MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction

TL;DR

MART introduces a MultiscAle Relational Transformer for multi-agent trajectory prediction, integrating a Pair-wise Relational Transformer and a Hyper Relational Transformer within a MARTE encoder to capture both individual and group interactions. The Adaptive Group Estimator infers overlapping group relations with a learnable threshold, enabling flexible group reasoning without predefined topologies. Across NBA, SDD, and ETH-UCY, MART achieves state-of-the-art or competitive results, notably reducing ADE by 3.9% and FDE by 11.8% on NBA compared with EqMotion, while requiring substantially fewer parameters and MAC operations than prior SOTA models. The approach advances trajectory forecasting by enabling robust group-aware attention, with potential extensions to incorporate temporal context and scene information for enhanced real-world planning.

Abstract

Multi-agent trajectory prediction is crucial to autonomous driving and understanding the surrounding environment. Learning-based approaches for multi-agent trajectory prediction, such as primarily relying on graph neural networks, graph transformers, and hypergraph neural networks, have demonstrated outstanding performance on real-world datasets in recent years. However, the hypergraph transformer-based method for trajectory prediction is yet to be explored. Therefore, we present a MultiscAle Relational Transformer (MART) network for multi-agent trajectory prediction. MART is a hypergraph transformer architecture to consider individual and group behaviors in transformer machinery. The core module of MART is the encoder, which comprises a Pair-wise Relational Transformer (PRT) and a Hyper Relational Transformer (HRT). The encoder extends the capabilities of a relational transformer by introducing HRT, which integrates hyperedge features into the transformer mechanism, promoting attention weights to focus on group-wise relations. In addition, we propose an Adaptive Group Estimator (AGE) designed to infer complex group relations in real-world environments. Extensive experiments on three real-world datasets (NBA, SDD, and ETH-UCY) demonstrate that our method achieves state-of-the-art performance, enhancing ADE/FDE by 3.9%/11.8% on the NBA dataset. Code is available at https://github.com/gist-ailab/MART.
Paper Structure (29 sections, 14 equations, 12 figures, 7 tables)

This paper contains 29 sections, 14 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: (a) Results of ablation study about different interaction encoders on the NBA dataset. (b) Baseline prediction system used in ablation study. We conduct an ablation study on interaction encoders to compare the MART encoder with state-of-the-art encoders. The average displacement error (ADE) and final displacement error (FDE) are assessed using the NBA dataset.
  • Figure 2: Model architecture of the proposed MultiscAle Relational Transformer (MART) network. Our model has four components: feature initialization (sky blue), Adaptive Group Estimator (orange), MART encoder (red), and future trajectory decoder (Decoder) (purple). The MART encoder includes a pair-wise relational transformer (green) and a hyper relational transformer (pink).
  • Figure 3: Illustration of adaptive group estimator (AGE) module. The AGE module employs an adaptive thresholding inspired by STE trick bengio2013estimating. The blue and red arrows represent the forward and estimated backward paths, respectively.
  • Figure 4: Architecture of MARTE. The regions in green, pink, and red correspond to those in Figure \ref{['fig:main_model']}. The black/purple/coral arrows denote the flow of node features/edge features/group incidence matrix.
  • Figure 5: Qualitative result comparison on NBA dataset. We qualitatively compare our method with two recent SOTA methods using the best of 20 predictions. (a) illustrates predictions for all agents, and (b) focuses on the ball-keeping player and basketball. Our method predicts more accurate future trajectories compared to LED mao2023leapfrog and EqMotion xu2023eqmotion. Notably, as illustrated in (b), MART provides robust predictions about the ball-keeping player and basketball, while other methods do not. These results demonstrate that MART can handle more complex interactions compared to state-of-the-art (SOTA) methods. (Light color represents the past trajectory, while blue/red/green indicate the two teams and basketball.)
  • ...and 7 more figures