MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction
Bencheng Liao, Shaoyu Chen, Yunchi Zhang, Bo Jiang, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang
TL;DR
MapTRv2 introduces an end-to-end framework for online vectorized HD map construction that models map elements as permutation-equivalent point sets, enabling stable learning for arbitrarily shaped elements. It leverages a hierarchical query embedding and decoupled self-attention within a Transformer encoder-decoder to efficiently predict both undirected and directed map elements, supported by hierarchical bipartite matching and auxiliary dense supervision to accelerate convergence. The approach achieves real-time performance and state-of-the-art accuracy on nuScenes and Argoverse2, and extends to centerline learning and 3D map reconstruction. These advances offer a practical, scalable module for autonomous driving pipelines and downstream planning tasks.
Abstract
High-definition (HD) map provides abundant and precise static environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system. In this paper, we present \textbf{Map} \textbf{TR}ansformer, an end-to-end framework for online vectorized HD map construction. We propose a unified permutation-equivalent modeling approach, \ie, modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process. We design a hierarchical query embedding scheme to flexibly encode structured map information and perform hierarchical bipartite matching for map element learning. To speed up convergence, we further introduce auxiliary one-to-many matching and dense supervision. The proposed method well copes with various map elements with arbitrary shapes. It runs at real-time inference speed and achieves state-of-the-art performance on both nuScenes and Argoverse2 datasets. Abundant qualitative results show stable and robust map construction quality in complex and various driving scenes. Code and more demos are available at \url{https://github.com/hustvl/MapTR} for facilitating further studies and applications.
