Table of Contents
Fetching ...

MacFormer: Map-Agent Coupled Transformer for Real-time and Robust Trajectory Prediction

Chen Feng, Hangning Zhou, Huadong Lin, Zhigang Zhang, Ziyao Xu, Chi Zhang, Boyu Zhou, Shaojie Shen

TL;DR

MacFormer introduces a Map-Agent Coupled Transformer for real-time trajectory prediction by directly integrating map constraints through coupled map and a reference extractor, guided by a multi-task optimization strategy. A bilateral query scheme enables efficient cross-domain context fusion, reducing computational complexity while maintaining accuracy. The approach achieves state-of-the-art results on Argoverse 1/2 and nuScenes with lower latency and fewer parameters, and demonstrates robustness to imperfect tracklets and versatility in enhancing classical models. By explicitly modeling map-trajectory coupling and multimodal centerlines, the method improves prediction fidelity and practical deployability in autonomous navigation.

Abstract

Predicting the future behavior of agents is a fundamental task in autonomous vehicle domains. Accurate prediction relies on comprehending the surrounding map, which significantly regularizes agent behaviors. However, existing methods have limitations in exploiting the map and exhibit a strong dependence on historical trajectories, which yield unsatisfactory prediction performance and robustness. Additionally, their heavy network architectures impede real-time applications. To tackle these problems, we propose Map-Agent Coupled Transformer (MacFormer) for real-time and robust trajectory prediction. Our framework explicitly incorporates map constraints into the network via two carefully designed modules named coupled map and reference extractor. A novel multi-task optimization strategy (MTOS) is presented to enhance learning of topology and rule constraints. We also devise bilateral query scheme in context fusion for a more efficient and lightweight network. We evaluated our approach on Argoverse 1, Argoverse 2, and nuScenes real-world benchmarks, where it all achieved state-of-the-art performance with the lowest inference latency and smallest model size. Experiments also demonstrate that our framework is resilient to imperfect tracklet inputs. Furthermore, we show that by combining with our proposed strategies, classical models outperform their baselines, further validating the versatility of our framework.

MacFormer: Map-Agent Coupled Transformer for Real-time and Robust Trajectory Prediction

TL;DR

MacFormer introduces a Map-Agent Coupled Transformer for real-time trajectory prediction by directly integrating map constraints through coupled map and a reference extractor, guided by a multi-task optimization strategy. A bilateral query scheme enables efficient cross-domain context fusion, reducing computational complexity while maintaining accuracy. The approach achieves state-of-the-art results on Argoverse 1/2 and nuScenes with lower latency and fewer parameters, and demonstrates robustness to imperfect tracklets and versatility in enhancing classical models. By explicitly modeling map-trajectory coupling and multimodal centerlines, the method improves prediction fidelity and practical deployability in autonomous navigation.

Abstract

Predicting the future behavior of agents is a fundamental task in autonomous vehicle domains. Accurate prediction relies on comprehending the surrounding map, which significantly regularizes agent behaviors. However, existing methods have limitations in exploiting the map and exhibit a strong dependence on historical trajectories, which yield unsatisfactory prediction performance and robustness. Additionally, their heavy network architectures impede real-time applications. To tackle these problems, we propose Map-Agent Coupled Transformer (MacFormer) for real-time and robust trajectory prediction. Our framework explicitly incorporates map constraints into the network via two carefully designed modules named coupled map and reference extractor. A novel multi-task optimization strategy (MTOS) is presented to enhance learning of topology and rule constraints. We also devise bilateral query scheme in context fusion for a more efficient and lightweight network. We evaluated our approach on Argoverse 1, Argoverse 2, and nuScenes real-world benchmarks, where it all achieved state-of-the-art performance with the lowest inference latency and smallest model size. Experiments also demonstrate that our framework is resilient to imperfect tracklet inputs. Furthermore, we show that by combining with our proposed strategies, classical models outperform their baselines, further validating the versatility of our framework.
Paper Structure (14 sections, 20 equations, 9 figures, 9 tables)

This paper contains 14 sections, 20 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: The overview of map utilization in trajectory prediction. Existing works (Top) take implicit-fusion or target-driven manner, which are yet to exhaust the map. In contrast, our method (Bottom) explicitly and sufficiently leverage map constraints, effectively capturing prediction uncertainty and enhancing robustness to imperfect tracklets. (Sect.\ref{['sec:intro']})
  • Figure 2: (a) The system overview of MacFormer. (b) The detailed implementation of MacFormer and output size of each operation.
  • Figure 3: (a) Illustration of map constraints on multi-modality. (b) Illustration of segments in vectorized map.
  • Figure 4: The detailed structure of TopoGate and MotionGate. (Sect.\ref{['sub:coupled_layer']})
  • Figure 5: The overview of the proposed reference extractor. (Sect.\ref{['sub:ref_ext']})
  • ...and 4 more figures