MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction
Seongju Lee, Junseok Lee, Yeonguk Yu, Taeri Kim, Kyoobin Lee
TL;DR
MART introduces a MultiscAle Relational Transformer for multi-agent trajectory prediction, integrating a Pair-wise Relational Transformer and a Hyper Relational Transformer within a MARTE encoder to capture both individual and group interactions. The Adaptive Group Estimator infers overlapping group relations with a learnable threshold, enabling flexible group reasoning without predefined topologies. Across NBA, SDD, and ETH-UCY, MART achieves state-of-the-art or competitive results, notably reducing ADE by 3.9% and FDE by 11.8% on NBA compared with EqMotion, while requiring substantially fewer parameters and MAC operations than prior SOTA models. The approach advances trajectory forecasting by enabling robust group-aware attention, with potential extensions to incorporate temporal context and scene information for enhanced real-world planning.
Abstract
Multi-agent trajectory prediction is crucial to autonomous driving and understanding the surrounding environment. Learning-based approaches for multi-agent trajectory prediction, such as primarily relying on graph neural networks, graph transformers, and hypergraph neural networks, have demonstrated outstanding performance on real-world datasets in recent years. However, the hypergraph transformer-based method for trajectory prediction is yet to be explored. Therefore, we present a MultiscAle Relational Transformer (MART) network for multi-agent trajectory prediction. MART is a hypergraph transformer architecture to consider individual and group behaviors in transformer machinery. The core module of MART is the encoder, which comprises a Pair-wise Relational Transformer (PRT) and a Hyper Relational Transformer (HRT). The encoder extends the capabilities of a relational transformer by introducing HRT, which integrates hyperedge features into the transformer mechanism, promoting attention weights to focus on group-wise relations. In addition, we propose an Adaptive Group Estimator (AGE) designed to infer complex group relations in real-world environments. Extensive experiments on three real-world datasets (NBA, SDD, and ETH-UCY) demonstrate that our method achieves state-of-the-art performance, enhancing ADE/FDE by 3.9%/11.8% on the NBA dataset. Code is available at https://github.com/gist-ailab/MART.
