Table of Contents
Fetching ...

Learning Global Representation from Queries for Vectorized HD Map Construction

Shoumeng Qiu, Xinrun Li, Yang Long, Xiangyang Xue, Varun Ojha, Jian Pu

TL;DR

This work tackles online vectorized HD map construction by identifying a limitation in DETR-like query learning: it typically emphasizes local, instance-level details rather than the global map structure. It introduces MapGR, comprising Global Representation Learning (GRL) to derive a global map embedding from all queries and Global Representation Guidance (GRG) to fuse this global context into each query, enabling holistic and local optimization simultaneously. GRL supervises a rasterized BEV map derived from ground-truth maps, while GRG injects the global embedding into per-query representations for enhanced decoding. Across nuScenes and Argoverse 2, MapGR consistently improves mAP over strong baselines and achieves state-of-the-art results on nuScenes, all with minimal computational overhead, highlighting its practical scalability for online HD map construction.

Abstract

The online construction of vectorized high-definition (HD) maps is a cornerstone of modern autonomous driving systems. State-of-the-art approaches, particularly those based on the DETR framework, formulate this as an instance detection problem. However, their reliance on independent, learnable object queries results in a predominantly local query perspective, neglecting the inherent global representation within HD maps. In this work, we propose \textbf{MapGR} (\textbf{G}lobal \textbf{R}epresentation learning for HD \textbf{Map} construction), an architecture designed to learn and utilize a global representations from queries. Our method introduces two synergistic modules: a Global Representation Learning (GRL) module, which encourages the distribution of all queries to better align with the global map through a carefully designed holistic segmentation task, and a Global Representation Guidance (GRG) module, which endows each individual query with explicit, global-level contextual information to facilitate its optimization. Evaluations on the nuScenes and Argoverse2 datasets validate the efficacy of our approach, demonstrating substantial improvements in mean Average Precision (mAP) compared to leading baselines.

Learning Global Representation from Queries for Vectorized HD Map Construction

TL;DR

This work tackles online vectorized HD map construction by identifying a limitation in DETR-like query learning: it typically emphasizes local, instance-level details rather than the global map structure. It introduces MapGR, comprising Global Representation Learning (GRL) to derive a global map embedding from all queries and Global Representation Guidance (GRG) to fuse this global context into each query, enabling holistic and local optimization simultaneously. GRL supervises a rasterized BEV map derived from ground-truth maps, while GRG injects the global embedding into per-query representations for enhanced decoding. Across nuScenes and Argoverse 2, MapGR consistently improves mAP over strong baselines and achieves state-of-the-art results on nuScenes, all with minimal computational overhead, highlighting its practical scalability for online HD map construction.

Abstract

The online construction of vectorized high-definition (HD) maps is a cornerstone of modern autonomous driving systems. State-of-the-art approaches, particularly those based on the DETR framework, formulate this as an instance detection problem. However, their reliance on independent, learnable object queries results in a predominantly local query perspective, neglecting the inherent global representation within HD maps. In this work, we propose \textbf{MapGR} (\textbf{G}lobal \textbf{R}epresentation learning for HD \textbf{Map} construction), an architecture designed to learn and utilize a global representations from queries. Our method introduces two synergistic modules: a Global Representation Learning (GRL) module, which encourages the distribution of all queries to better align with the global map through a carefully designed holistic segmentation task, and a Global Representation Guidance (GRG) module, which endows each individual query with explicit, global-level contextual information to facilitate its optimization. Evaluations on the nuScenes and Argoverse2 datasets validate the efficacy of our approach, demonstrating substantial improvements in mean Average Precision (mAP) compared to leading baselines.

Paper Structure

This paper contains 22 sections, 10 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: (a) Multi-view images from on-board sensors. (b) A conventional DETR-like HD map construction pipeline. (c) Our proposed global representation learning of queries for the map construction task significantly improves query distribution from the initial to the final decoder layers. This improvement leads to smoother and more consistent curvature changes in instances, ensuring better alignment with the global structure. $Q_{i}$ represents the query set from the $i$-th decoding layer. The red box marks a region where query distribution improves significantly.
  • Figure 2: The details of our proposed method. The map encoder transforms multi-view images into a BEV embedding, while the decoder enables map queries to interact and extract information from the BEV embedding to decode vectorized map instances. The GRL module aggregates these queries into a global representation for the overall map distribution. The global representation is then used by the GRG module to enhance the query in the subsequent decoding process.
  • Figure 3: (a) Sampling and matching-based query learning. (b) Global representation aided query learning. It is evident that not all queries can be matched and obtain gradients. However, by leveraging global embedding to aggregate queries, all queries can obtain gradients derived from the global distribution prediction.
  • Figure 4: Quantitative comparison between our methods with MapQR and MapTRv2 on the nuScenes validation dataset.
  • Figure 5: Experiments of MLP-based image encoding and decoding.
  • ...and 4 more figures