Table of Contents
Fetching ...

Online Vectorized HD Map Construction using Geometry

Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding, Fusheng Jin, Xiangyu Yue

TL;DR

GeMap addresses online vectorized HD map construction by introducing G-Representation, which encodes rotation- and translation-invariant geometry of map instances through Euclidean Shape Clues and Euclidean Relation Clues. A Geometry-Decoupled Attention mechanism decouples shape and relation learning within a geometry-focused decoder, optimized by Euclidean Loss that jointly enforces accurate intra-instance shapes and inter-instance relations. The approach achieves state-of-the-art results on NuScenes and Argoverse 2, with camera-only mAPs of 69.4% and 71.8% respectively, and demonstrates robustness to occlusion and rigid transformations, paving the way for more reliable downstream prediction and planning tasks. These findings highlight the practical value of explicit geometric priors in end-to-end vectorized HD map construction and hint at broader applicability to other perception-and-planning challenges in autonomous driving.

Abstract

The construction of online vectorized High-Definition (HD) maps is critical for downstream prediction and planning. Recent efforts have built strong baselines for this task, however, shapes and relations of instances in urban road systems are still under-explored, such as parallelism, perpendicular, or rectangle-shape. In our work, we propose GeMap ($\textbf{Ge}$ometry $\textbf{Map}$), which end-to-end learns Euclidean shapes and relations of map instances beyond basic perception. Specifically, we design a geometric loss based on angle and distance clues, which is robust to rigid transformations. We also decouple self-attention to independently handle Euclidean shapes and relations. Our method achieves new state-of-the-art performance on the NuScenes and Argoverse 2 datasets. Remarkably, it reaches a 71.8% mAP on the large-scale Argoverse 2 dataset, outperforming MapTR V2 by +4.4% and surpassing the 70% mAP threshold for the first time. Code is available at https://github.com/cnzzx/GeMap.

Online Vectorized HD Map Construction using Geometry

TL;DR

GeMap addresses online vectorized HD map construction by introducing G-Representation, which encodes rotation- and translation-invariant geometry of map instances through Euclidean Shape Clues and Euclidean Relation Clues. A Geometry-Decoupled Attention mechanism decouples shape and relation learning within a geometry-focused decoder, optimized by Euclidean Loss that jointly enforces accurate intra-instance shapes and inter-instance relations. The approach achieves state-of-the-art results on NuScenes and Argoverse 2, with camera-only mAPs of 69.4% and 71.8% respectively, and demonstrates robustness to occlusion and rigid transformations, paving the way for more reliable downstream prediction and planning tasks. These findings highlight the practical value of explicit geometric priors in end-to-end vectorized HD map construction and hint at broader applicability to other perception-and-planning challenges in autonomous driving.

Abstract

The construction of online vectorized High-Definition (HD) maps is critical for downstream prediction and planning. Recent efforts have built strong baselines for this task, however, shapes and relations of instances in urban road systems are still under-explored, such as parallelism, perpendicular, or rectangle-shape. In our work, we propose GeMap (ometry ), which end-to-end learns Euclidean shapes and relations of map instances beyond basic perception. Specifically, we design a geometric loss based on angle and distance clues, which is robust to rigid transformations. We also decouple self-attention to independently handle Euclidean shapes and relations. Our method achieves new state-of-the-art performance on the NuScenes and Argoverse 2 datasets. Remarkably, it reaches a 71.8% mAP on the large-scale Argoverse 2 dataset, outperforming MapTR V2 by +4.4% and surpassing the 70% mAP threshold for the first time. Code is available at https://github.com/cnzzx/GeMap.
Paper Structure (27 sections, 20 equations, 10 figures, 10 tables)

This paper contains 27 sections, 20 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Geometric Invariance. (a) As the ego vehicle moves, after rotation ($\leftarrow$) and translation ($\rightarrow$), the shape of the crossing and the parallelism between lanes remain unchanged, which indicates the invariant property of geometry to rigid transformations. (b) Absolute coordinates are vulnerable to rotation and translation, however, our G-representation is invariant, which is more suitable to capture geometric properties.
  • Figure 2: Geometric properties and G-Representation. (a) Geometry in the transportation road system. (b) (c) We propose to model the geometric properties of a single map instance and multiple instances, with magnitude $d$ and angle $\alpha$.
  • Figure 3: Illustration of our framework. First, PV images are transformed into BEV features, then a Geometry-Decoupled Decoder outputs the vectorized HD Map. In each block of the decoder, queries are first processed by Euclidean shape and relation attention, which focuses on geometric relevance. Finally, predictions are enhanced in G-Representations by shape and relation constraint.
  • Figure 4: Euclidean Shape Clues. Magnitudes of displacement vectors and angles between neighboring vectors indicate shape clues and are utilized to compute shape loss. The right part shows how to connect Euclidean Shape Clues to shape geometry.
  • Figure 5: Euclidean Relation Clues. Angles between pairs of displacement vectors on different polylines, and magnitudes of displacement vectors between point pairs indicate relation clues. Such relation clues are more superficially connected to Euclidean relation geometry as shown in the boxes.
  • ...and 5 more figures