Table of Contents
Fetching ...

Improving Hierarchical Representations of Vectorized HD Maps with Perspective Clues

Chi Zhang, Qi Song, Feifei Li, Jie Li, Rui Huang

TL;DR

PerCMap addresses the challenge of constructing vectorized HD maps from surround-view images by reusing perspective-view clues at both the instance and point levels. It introduces Cross-view Instance Activation (CIA) to produce instance-aware queries from multi-view PV features and Dual-view Point Embedding (DPE) to create input-aware positional embeddings by fusing PV and BEV information, mitigating information loss from PV-to-BEV transformations. The method is reinforced by a heatmap-based integration and a rasterized instance segmentation loss, yielding consistent gains across nuScenes and Argoverse 2 benchmarks and demonstrating robustness to weather, lighting, and PV-to-BEV modules. Overall, PerCMap advances vectorized HD map construction by preserving visual priors and enabling more accurate geometry and topology recovery, with practical implications for reliable autonomous driving mapping pipelines.

Abstract

The construction of vectorized High-Definition (HD) maps from onboard surround-view cameras has become a significant focus in autonomous driving. However, current map vector estimation pipelines face two key limitations: input-agnostic queries struggle to capture complex map structures, and the view transformation leads to information loss. These issues often result in inaccurate shape restoration or missing instances in map predictions. To address this concern, we propose a novel approach, namely \textbf{PerCMap}, which explicitly exploits clues from perspective-view features at both instance and point level. Specifically, at instance level, we propose Cross-view Instance Activation (CIA) to activate instance queries across surround-view images, thereby helping the model recover the instance attributes of map vectors. At point level, we design Dual-view Point Embedding (DPE), which fuses features from both views to generate input-aware positional embeddings and improve the accuracy of point coordinate estimation. Extensive experiments on \textit{nuScenes} and \textit{Argoverse 2} demonstrate that PerCMap achieves strong and consistent performance across benchmarks, reaching 67.1 and 70.5 mAP, respectively.

Improving Hierarchical Representations of Vectorized HD Maps with Perspective Clues

TL;DR

PerCMap addresses the challenge of constructing vectorized HD maps from surround-view images by reusing perspective-view clues at both the instance and point levels. It introduces Cross-view Instance Activation (CIA) to produce instance-aware queries from multi-view PV features and Dual-view Point Embedding (DPE) to create input-aware positional embeddings by fusing PV and BEV information, mitigating information loss from PV-to-BEV transformations. The method is reinforced by a heatmap-based integration and a rasterized instance segmentation loss, yielding consistent gains across nuScenes and Argoverse 2 benchmarks and demonstrating robustness to weather, lighting, and PV-to-BEV modules. Overall, PerCMap advances vectorized HD map construction by preserving visual priors and enabling more accurate geometry and topology recovery, with practical implications for reliable autonomous driving mapping pipelines.

Abstract

The construction of vectorized High-Definition (HD) maps from onboard surround-view cameras has become a significant focus in autonomous driving. However, current map vector estimation pipelines face two key limitations: input-agnostic queries struggle to capture complex map structures, and the view transformation leads to information loss. These issues often result in inaccurate shape restoration or missing instances in map predictions. To address this concern, we propose a novel approach, namely \textbf{PerCMap}, which explicitly exploits clues from perspective-view features at both instance and point level. Specifically, at instance level, we propose Cross-view Instance Activation (CIA) to activate instance queries across surround-view images, thereby helping the model recover the instance attributes of map vectors. At point level, we design Dual-view Point Embedding (DPE), which fuses features from both views to generate input-aware positional embeddings and improve the accuracy of point coordinate estimation. Extensive experiments on \textit{nuScenes} and \textit{Argoverse 2} demonstrate that PerCMap achieves strong and consistent performance across benchmarks, reaching 67.1 and 70.5 mAP, respectively.
Paper Structure (33 sections, 12 equations, 7 figures, 9 tables)

This paper contains 33 sections, 12 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Abstract pipeline and prediction samples of PerCMap. In (a), $\dashrightarrow$ indicates the flow of the traditional sequential pipeline, whereas $\bm{\rightarrow}$ indicates our redesigned flow that reuses PV features at both instance and point levels. In (b), MapTRv2liao2023maptrv2 fails to preserve instance clues and shape priors in the circled regions, while PerCMap accurately detects both lane markings and boundary shapes.
  • Figure 2: Overall architecture of PerCMap. The proposed Cross-view Instance Activation (CIA) and Dual-view Point Embedding (DPE) enhance vector prediction at the instance and point levels, respectively. Unlike the typical sequential pipeline, PerCMap effectively reuses PV features to improve prediction accuracy and map construction quality. "PE" refers to positional embedding.
  • Figure 3: Process and visualized examples of Cross-view Instance Activation. (a) CIA extracts and aggregates instance features from multiple views to generate the Activated Instance Query. (b) Heatmap of the Instance Map $F_{In}$, where high intensities align with map element regions.
  • Figure 4: Workflow of Dual-view Point Embedding. Given the features of both views, DPE explicitly incorporates these, with the intermediate Integrated Heatmap is supervised. BEV features are thereby enhanced due to the influence of projection and supervision.
  • Figure 5: Qualitative results on nuScenes validation dataset. We provide the complete surround view inputs, predictions of MapTRv2liao2023maptrv2, and PerCMap, and the Ground Truth map. The regions that are highlighted by colored ellipses enclose instances that are difficult to detect.
  • ...and 2 more figures