Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

Zihao Liu; Xiaoyu Zhang; Guangwei Liu; Ji Zhao; Ningyi Xu

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

Zihao Liu, Xiaoyu Zhang, Guangwei Liu, Ji Zhao, Ningyi Xu

TL;DR

This work tackles online vectorized HD map construction for autonomous driving by rethinking query design in DETR-like architectures. It introduces MapQR, featuring a scatter-and-gather instance-query mechanism and position-aware embeddings that enable explicit per-element content sharing across multiple sample points, along with a flexible height-aware BEV encoder (GKT-h). The approach yields state-of-the-art mAP on nuScenes and Argoverse 2 while maintaining practical inference speed, and it generalizes to improve other DETR-based map construction models. The method offers a simple yet effective improvement path for end-to-end vectorized map prediction, with public code to facilitate adoption.

Abstract

In autonomous driving, the high-definition (HD) map plays a crucial role in localization and planning. Recently, several methods have facilitated end-to-end online map construction in DETR-like frameworks. However, little attention has been paid to the potential capabilities of exploring the query mechanism for map elements. This paper introduces MapQR, an end-to-end method with an emphasis on enhancing query capabilities for constructing online vectorized maps. To probe desirable information efficiently, MapQR utilizes a novel query design, called scatter-and-gather query, which is modelled by separate content and position parts explicitly. The base map instance queries are scattered to different reference points and added with positional embeddings to probe information from BEV features. Then these scatted queries are gathered back to enhance information within each map instance. Together with a simple and effective improvement of a BEV encoder, the proposed MapQR achieves the best mean average precision (mAP) and maintains good efficiency on both nuScenes and Argoverse 2. In addition, integrating our query design into other models can boost their performance significantly. The source code is available at https://github.com/HXMap/MapQR.

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

TL;DR

Abstract

Paper Structure (22 sections, 9 equations, 10 figures, 10 tables)

This paper contains 22 sections, 9 equations, 10 figures, 10 tables.

Introduction
Related Work
Online Vectorized HD Map Construction
Multi-View Camera-to-BEV Transformation
Detection Transformers
Method
Overall Architecture
Decoder with Scatter-and-Gather Query
BEV Encoder: GKT with Flexible Height
Matching Cost and Training Loss
Experiments
Settings
Comparisons with State-of-the-art Methods
Ablation Study
Conclusion
...and 7 more sections

Figures (10)

Figure 1: Comparison of overall architectures. Left: A DETR-like architecture exploited in many map construction methods. Right: The proposed architecture with scatter-and-gather query and positional embedding. Each instance is explicitly modelled by shared content part and different position parts. The information within a single instance is also enhanced by the gathering operation. In addition, reference points are used for positional embedding of these queries.
Figure 2: The overall architecture of our method. It contains three main components: a shared image backbone to extract image features, a view transformation module to obtain BEV features, and a transformer decoder for generating predictions. The backbone and view transformation modules can be any popular one without additional adaption. The decoder is our key design, and in principle it can be directly applied to other DETR-like models of map construction.
Figure 3: Comparison of decoders. Left: The decoder of MapTR liao2022maptr. Right: The proposed decoder of MapQR. In this example, $4$ reference points are contained in an instance.
Figure 4: Comparison with SOTAs on qualitative visualization. The images are taken from the nuScenes dataset. The orange, blue and green colors represent lane divider, pedestrian crossing and road boundary, respectively. The proposed method obtains more accurate maps. The backbone R50 and $110$ epochs are used in all the methods.
Figure 5: Output of the first two decoder layers. The point and instance are colored by their predicted labels. Left: In MapTRv2, instance labels are predicted from the average of point queries. Point queries within the same instance may have different labels, resulting content conflict. See red circles. Right: Since the proposed SGQ ensures shared content within the same instance, MapQR can completely avoid such conflict.
...and 5 more figures

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

TL;DR

Abstract

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

Authors

TL;DR

Abstract

Table of Contents

Figures (10)