Table of Contents
Fetching ...

HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction

Yi Zhou, Hui Zhang, Jiaqian Yu, Yifan Yang, Sangil Jung, Seung-In Park, ByungIn Yoo

TL;DR

HIMap introduces HIQuery, a hybrid representation that encodes both point-level positions and element-level shapes for end-to-end vectorized HD map construction. A multi-layer Hybrid Decoder with a point-element interactor enables mutual refinement between points and element geometry, while a point-element consistency constraint reinforces cross-level alignment. The approach delivers state-of-the-art performance on nuScenes and Argoverse2, demonstrating particularly strong gains over prior point-only methods and sequentially combined element-level methods. The framework supports multi-modality inputs and extensions to 3D maps and centerlines, with ablations confirming the value of shared position embeddings and integrated cross-level updates. This work advances accurate, end-to-end vectorized HD map construction with practical implications for autonomous driving perception and planning.

Abstract

Vectorized High-Definition (HD) map construction requires predictions of the category and point coordinates of map elements (e.g. road boundary, lane divider, pedestrian crossing, etc.). State-of-the-art methods are mainly based on point-level representation learning for regressing accurate point coordinates. However, this pipeline has limitations in obtaining element-level information and handling element-level failures, e.g. erroneous element shape or entanglement between elements. To tackle the above issues, we propose a simple yet effective HybrId framework named HIMap to sufficiently learn and interact both point-level and element-level information. Concretely, we introduce a hybrid representation called HIQuery to represent all map elements, and propose a point-element interactor to interactively extract and encode the hybrid information of elements, e.g. point position and element shape, into the HIQuery. Additionally, we present a point-element consistency constraint to enhance the consistency between the point-level and element-level information. Finally, the output point-element integrated HIQuery can be directly converted into map elements' class, point coordinates, and mask. We conduct extensive experiments and consistently outperform previous methods on both nuScenes and Argoverse2 datasets. Notably, our method achieves $77.8$ mAP on the nuScenes dataset, remarkably superior to previous SOTAs by $8.3$ mAP at least.

HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction

TL;DR

HIMap introduces HIQuery, a hybrid representation that encodes both point-level positions and element-level shapes for end-to-end vectorized HD map construction. A multi-layer Hybrid Decoder with a point-element interactor enables mutual refinement between points and element geometry, while a point-element consistency constraint reinforces cross-level alignment. The approach delivers state-of-the-art performance on nuScenes and Argoverse2, demonstrating particularly strong gains over prior point-only methods and sequentially combined element-level methods. The framework supports multi-modality inputs and extensions to 3D maps and centerlines, with ablations confirming the value of shared position embeddings and integrated cross-level updates. This work advances accurate, end-to-end vectorized HD map construction with practical implications for autonomous driving perception and planning.

Abstract

Vectorized High-Definition (HD) map construction requires predictions of the category and point coordinates of map elements (e.g. road boundary, lane divider, pedestrian crossing, etc.). State-of-the-art methods are mainly based on point-level representation learning for regressing accurate point coordinates. However, this pipeline has limitations in obtaining element-level information and handling element-level failures, e.g. erroneous element shape or entanglement between elements. To tackle the above issues, we propose a simple yet effective HybrId framework named HIMap to sufficiently learn and interact both point-level and element-level information. Concretely, we introduce a hybrid representation called HIQuery to represent all map elements, and propose a point-element interactor to interactively extract and encode the hybrid information of elements, e.g. point position and element shape, into the HIQuery. Additionally, we present a point-element consistency constraint to enhance the consistency between the point-level and element-level information. Finally, the output point-element integrated HIQuery can be directly converted into map elements' class, point coordinates, and mask. We conduct extensive experiments and consistently outperform previous methods on both nuScenes and Argoverse2 datasets. Notably, our method achieves mAP on the nuScenes dataset, remarkably superior to previous SOTAs by mAP at least.
Paper Structure (17 sections, 7 equations, 8 figures, 14 tables)

This paper contains 17 sections, 7 equations, 8 figures, 14 tables.

Figures (8)

  • Figure 1: Examples of previous failures and our improved results. Compared with the previous point-level representation learning pipeline liao2023maptrv2, our proposed hybrid representation learning method generates richer details, more accurate shapes of elements, and avoids the inter-element entanglement. Best viewed in color.
  • Figure 2: Illustration of our motivation for point-element interaction. Previous works zhang2023onlineqiao2023endding2023pivotnet usually lack the interaction between point and element, easily leading to either an incomplete element shape or inaccurate point positions. With the point-element interaction based on hybrid representation (shortened to rep.), our method achieves a more complete shape and accurate positions simultaneously.
  • Figure 3: Overview of the HIMap. Top: The pipeline of HIMap, consisting of a BEV feature extractor and a hybrid decoder. It takes multi-view images as input and outputs vectorized map elements in an end-to-end fashion. Bottom: Detailed process of the point-element interactor, which interactively extracts both point-level and element-level information of map elements, and the point-element consistency for enhancing the information consistency inside an element and the discrimination between elements. Best viewed in color.
  • Figure 4: Attention maps of HIQuery at different layers. Attention maps are overlaid on the GT. The darker the color, the greater the attention value. Best zoom-in and viewed in color.
  • Figure S1: Attention maps of HIQuery at different layers. Attention maps are overlaid on the GT. The darker the color, the greater the attention value. Best zoom-in and viewed in color.
  • ...and 3 more figures