Table of Contents
Fetching ...

M3TR: A Generalist Model for Real-World HD Map Completion

Fabian Immel, Richard Fehler, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller

TL;DR

This work tackles real-world HD map completion under outdated offline priors by introducing M3TR, a generalist transformer that can exploit arbitrary map priors or operate without them. It pairs a novel two-level query design with a map-masking augmentation regime and a prior-aware completion metric, enabling a single model to match or exceed specialized experts across diverse prior scenarios. A comprehensive HD Map Completion Benchmark with improved ground truth and the $\text{mAP}^{\mathcal{C}}$ metric grounds the evaluation in realistic map changes. Empirical results on Argoverse 2 and nuScenes show substantial gains over baselines (up to $\$+4.3$ mAP in priors and $\$+1.4$ mAP without priors), demonstrating the approach’s robustness and practical deployability for real-world HD map maintenance.

Abstract

Autonomous vehicles rely on HD maps for their operation, but offline HD maps eventually become outdated. For this reason, online HD map construction methods use live sensor data to infer map information instead. Research on real map changes shows that oftentimes entire parts of an HD map remain unchanged and can be used as a prior. We therefore introduce M3TR (Multi-Masking Map Transformer), a generalist approach for HD map completion both with and without offline HD map priors. As a necessary foundation, we address shortcomings in ground truth labels for Argoverse 2 and nuScenes and propose the first comprehensive benchmark for HD map completion. Unlike existing models that specialize in a single kind of map change, which is unrealistic for deployment, our Generalist model handles all kinds of changes, matching the effectiveness of Expert models. With our map masking as augmentation regime, we can even achieve a +1.4 mAP improvement without a prior. Finally, by fully utilizing prior HD map elements and optimizing query designs, M3TR outperforms existing methods by +4.3 mAP while being the first real-world deployable model for offline HD map priors. Code is available at https://github.com/immel-f/m3tr

M3TR: A Generalist Model for Real-World HD Map Completion

TL;DR

This work tackles real-world HD map completion under outdated offline priors by introducing M3TR, a generalist transformer that can exploit arbitrary map priors or operate without them. It pairs a novel two-level query design with a map-masking augmentation regime and a prior-aware completion metric, enabling a single model to match or exceed specialized experts across diverse prior scenarios. A comprehensive HD Map Completion Benchmark with improved ground truth and the metric grounds the evaluation in realistic map changes. Empirical results on Argoverse 2 and nuScenes show substantial gains over baselines (up to +4.3\ mAP without priors), demonstrating the approach’s robustness and practical deployability for real-world HD map maintenance.

Abstract

Autonomous vehicles rely on HD maps for their operation, but offline HD maps eventually become outdated. For this reason, online HD map construction methods use live sensor data to infer map information instead. Research on real map changes shows that oftentimes entire parts of an HD map remain unchanged and can be used as a prior. We therefore introduce M3TR (Multi-Masking Map Transformer), a generalist approach for HD map completion both with and without offline HD map priors. As a necessary foundation, we address shortcomings in ground truth labels for Argoverse 2 and nuScenes and propose the first comprehensive benchmark for HD map completion. Unlike existing models that specialize in a single kind of map change, which is unrealistic for deployment, our Generalist model handles all kinds of changes, matching the effectiveness of Expert models. With our map masking as augmentation regime, we can even achieve a +1.4 mAP improvement without a prior. Finally, by fully utilizing prior HD map elements and optimizing query designs, M3TR outperforms existing methods by +4.3 mAP while being the first real-world deployable model for offline HD map priors. Code is available at https://github.com/immel-f/m3tr

Paper Structure

This paper contains 20 sections, 3 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Overview of the model architecture of M3TR and the investigated point query encoder designs. For our evaluated task of HD map completion, we mask out instances from the ground truth map $\mathcal{M_{\mathrm{GT}}}$ to create a map prior $\mathcal{M_{\mathrm{P}}}$. Using $\mathcal{M_{\mathrm{P}}}$, we try to reconstruct $\mathcal{M_{\mathrm{GT}}}$. The map prior instances are supplied to the model as queries, influenced by the shown point query encoder and the detection query set design which is further illustrated in \ref{['fig:o2m_queries']}.
  • Figure 2: Visualization of map changes from av2_trust_but_verify, with the outdated map reprojected into the camera image. Real map changes can easily be translated into the proposed map prior scenarios.
  • Figure 3: Visualization of different detection query set designs with and without map prior $\mathcal{M}_p$. The set of queries are matched to ground truth map elements in either a one-to-one ($\mathrm{O2O}$) or one-to-many ($\mathrm{O2M}$) fashion. Compared to the baseline $\mathrm{O2M_{SMP}}$ query set design for map priors, we propose a tiling $\mathrm{O2M_{MMP}}$ design.
  • Figure 4: Visualization of the different training regimes for variable map priors investigated in this work. Compared to previous expert training regimes and a naive Generalist prior generation, our masking as augmentation regime leverages all available data for a Generalist model with improved performance.
  • Figure 5: Visualization of previous expert models vs. the Generalist model proposed in this work. The map prior scenarios $\mathcal{S}_p$ are listed in \ref{['tab:prior_scenarios']}.
  • ...and 8 more figures