Table of Contents
Fetching ...

MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction

Xiaolu Liu, Song Wang, Wentong Li, Ruizi Yang, Junbo Chen, Jianke Zhu

TL;DR

MGMap tackles the challenge of precise online HD map vectorization under subtle and sparse annotations by introducing mask-guided learning. It combines an Enhanced Multi-Level BEV feature extractor with a Mask-Activated Instance (MAI) decoder and a Position-Guided Mask Patch Refinement (PG-MPR) module to perform coarse instance-level and fine-grained point-level localization. The method jointly leverages learned instance masks and binary mask features to highlight informative regions and refine point locations through ROI-based patch features, achieving substantial mAP gains across nuScenes and Argoverse2. The results demonstrate strong robustness and generalization, with notable performance improvements over state-of-the-art approaches and clear ablations confirming the contribution of each component.

Abstract

Currently, high-definition (HD) map construction leans towards a lightweight online generation tendency, which aims to preserve timely and reliable road scene information. However, map elements contain strong shape priors. Subtle and sparse annotations make current detection-based frameworks ambiguous in locating relevant feature scopes and cause the loss of detailed structures in prediction. To alleviate these problems, we propose MGMap, a mask-guided approach that effectively highlights the informative regions and achieves precise map element localization by introducing the learned masks. Specifically, MGMap employs learned masks based on the enhanced multi-scale BEV features from two perspectives. At the instance level, we propose the Mask-activated instance (MAI) decoder, which incorporates global instance and structural information into instance queries by the activation of instance masks. At the point level, a novel position-guided mask patch refinement (PG-MPR) module is designed to refine point locations from a finer-grained perspective, enabling the extraction of point-specific patch information. Compared to the baselines, our proposed MGMap achieves a notable improvement of around 10 mAP for different input modalities. Extensive experiments also demonstrate that our approach showcases strong robustness and generalization capabilities. Our code can be found at https://github.com/xiaolul2/MGMap.

MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction

TL;DR

MGMap tackles the challenge of precise online HD map vectorization under subtle and sparse annotations by introducing mask-guided learning. It combines an Enhanced Multi-Level BEV feature extractor with a Mask-Activated Instance (MAI) decoder and a Position-Guided Mask Patch Refinement (PG-MPR) module to perform coarse instance-level and fine-grained point-level localization. The method jointly leverages learned instance masks and binary mask features to highlight informative regions and refine point locations through ROI-based patch features, achieving substantial mAP gains across nuScenes and Argoverse2. The results demonstrate strong robustness and generalization, with notable performance improvements over state-of-the-art approaches and clear ablations confirming the contribution of each component.

Abstract

Currently, high-definition (HD) map construction leans towards a lightweight online generation tendency, which aims to preserve timely and reliable road scene information. However, map elements contain strong shape priors. Subtle and sparse annotations make current detection-based frameworks ambiguous in locating relevant feature scopes and cause the loss of detailed structures in prediction. To alleviate these problems, we propose MGMap, a mask-guided approach that effectively highlights the informative regions and achieves precise map element localization by introducing the learned masks. Specifically, MGMap employs learned masks based on the enhanced multi-scale BEV features from two perspectives. At the instance level, we propose the Mask-activated instance (MAI) decoder, which incorporates global instance and structural information into instance queries by the activation of instance masks. At the point level, a novel position-guided mask patch refinement (PG-MPR) module is designed to refine point locations from a finer-grained perspective, enabling the extraction of point-specific patch information. Compared to the baselines, our proposed MGMap achieves a notable improvement of around 10 mAP for different input modalities. Extensive experiments also demonstrate that our approach showcases strong robustness and generalization capabilities. Our code can be found at https://github.com/xiaolul2/MGMap.
Paper Structure (23 sections, 14 equations, 11 figures, 9 tables)

This paper contains 23 sections, 14 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: For some detailed structures, our proposed MGMap achieves effective map element localization by highlighting the informative regions through the learned masks.
  • Figure 2: Overview of MGMap framework. MGMap mainly consists of three components: (1) BEV Extractor to obtain multi-scale BEV features by transforming from perspective view (PV) to BEV with the enhanced multi-level neck; (2) Mask-Activated Instance (MAI) Decoder is employed to construct and update queries at instance level; (3) Position-Guided Mask Patch Refinement (PG-MPR) module is designed to refine points' positions from local patch features at point level.
  • Figure 3: Illustration of mask constructions at different stages. In MAI decoder, instance masks are generated to activate lane queries, while binary masks are extracted to provide fine-grained patch features in PG-MPR.
  • Figure 4: (a) The conventional deformable attention extracts sparse features from sampling points, which may select irrelevant features; (b) Our proposed Mask Patch Refinement extracts more relevant features from the region of reliable patch.
  • Figure 5: The visual results of MapTR liao2022maptr, our proposed MGMap approach and the corresponding ground truth.
  • ...and 6 more figures