Table of Contents
Fetching ...

EAN-MapNet: Efficient Vectorized HD Map Construction with Anchor Neighborhoods

Huiyuan Xiong, Jun Shen, Taohong Zhu, Yuelong Pan

TL;DR

This work designs query units based on the anchor neighborhoods, allowing non-neighborhood central anchors to effectively assist in fitting the neighborhood central anchors to the target points representing map elements, and proposes grouped local self-attention (GL-SA) by leveraging the relative instance relationship among the queries.

Abstract

High-definition (HD) map is crucial for autonomous driving systems. Most existing works design map elements detection heads based on the DETR decoder. However, the initial queries lack explicit incorporation of physical positional information, and vanilla self-attention entails high computational complexity. Therefore, we propose EAN-MapNet for Efficiently constructing HD map using Anchor Neighborhoods. Firstly, we design query units based on the anchor neighborhoods, allowing non-neighborhood central anchors to effectively assist in fitting the neighborhood central anchors to the target points representing map elements. Then, we propose grouped local self-attention (GL-SA) by leveraging the relative instance relationship among the queries. This facilitates direct feature interaction among queries of the same instances, while innovatively employing local queries as intermediaries for interaction among queries from different instances. Consequently, GL-SA significantly reduces the computational complexity of self-attention while ensuring ample feature interaction among queries. On the nuScenes dataset, EAN-MapNet achieves a state-of-the-art performance with 63.0 mAP after training for 24 epochs, surpassing MapTR by 12.7 mAP. Furthermore, it considerably reduces memory consumption by 8198M compared to MapTRv2.

EAN-MapNet: Efficient Vectorized HD Map Construction with Anchor Neighborhoods

TL;DR

This work designs query units based on the anchor neighborhoods, allowing non-neighborhood central anchors to effectively assist in fitting the neighborhood central anchors to the target points representing map elements, and proposes grouped local self-attention (GL-SA) by leveraging the relative instance relationship among the queries.

Abstract

High-definition (HD) map is crucial for autonomous driving systems. Most existing works design map elements detection heads based on the DETR decoder. However, the initial queries lack explicit incorporation of physical positional information, and vanilla self-attention entails high computational complexity. Therefore, we propose EAN-MapNet for Efficiently constructing HD map using Anchor Neighborhoods. Firstly, we design query units based on the anchor neighborhoods, allowing non-neighborhood central anchors to effectively assist in fitting the neighborhood central anchors to the target points representing map elements. Then, we propose grouped local self-attention (GL-SA) by leveraging the relative instance relationship among the queries. This facilitates direct feature interaction among queries of the same instances, while innovatively employing local queries as intermediaries for interaction among queries from different instances. Consequently, GL-SA significantly reduces the computational complexity of self-attention while ensuring ample feature interaction among queries. On the nuScenes dataset, EAN-MapNet achieves a state-of-the-art performance with 63.0 mAP after training for 24 epochs, surpassing MapTR by 12.7 mAP. Furthermore, it considerably reduces memory consumption by 8198M compared to MapTRv2.
Paper Structure (23 sections, 7 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 7 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: The overall architecture of EAN-MapNet: First, images captured by the surround-view cameras undergo transformation into unified BEV feature by the BEV Encoder. Then, in the decoder, the auxiliary part assists the primary part in updating the neighborhood central anchors to the target points by fitting the non-neighborhood central anchors into the ground truth(GT) neighborhoods. GL-SA initially extracts local features of anchor queries within each group using local queries. Subsequently, vanilla self-attention is applied to the local queries, efficiently enabling feature interaction among groups. Then, each local query is assigned to its corresponding group of original query units, followed by ample feature interaction within each group.
  • Figure 2: A single query unit corresponds to two queries, each composed of the $2$-dimensional coordinates of anchors situated in the initial neighborhood, along with shared $d$-dimensional learnable parameters. The neighborhood central anchor is fitted to the target point, while the non-neighborhood central anchor is fitted to a random point within the GT neighborhood.
  • Figure 3: GT neighborhoods: We determine the maximum radius $r$ of the GT neighborhoods by half of the distance between vertices, and then further reduce the radius according to $\omega$, so as to maintain the overall shape feature of the map elements to the greatest extent.
  • Figure 4: Local feature extraction: A single local query queries a group of anchor queries to extract local feature.
  • Figure 5: Feature interaction within groups: In each group, each anchor query queries all queries to facilitate feature interaction within groups.
  • ...and 1 more figures