HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
Yi Zhou, Hui Zhang, Jiaqian Yu, Yifan Yang, Sangil Jung, Seung-In Park, ByungIn Yoo
TL;DR
HIMap introduces HIQuery, a hybrid representation that encodes both point-level positions and element-level shapes for end-to-end vectorized HD map construction. A multi-layer Hybrid Decoder with a point-element interactor enables mutual refinement between points and element geometry, while a point-element consistency constraint reinforces cross-level alignment. The approach delivers state-of-the-art performance on nuScenes and Argoverse2, demonstrating particularly strong gains over prior point-only methods and sequentially combined element-level methods. The framework supports multi-modality inputs and extensions to 3D maps and centerlines, with ablations confirming the value of shared position embeddings and integrated cross-level updates. This work advances accurate, end-to-end vectorized HD map construction with practical implications for autonomous driving perception and planning.
Abstract
Vectorized High-Definition (HD) map construction requires predictions of the category and point coordinates of map elements (e.g. road boundary, lane divider, pedestrian crossing, etc.). State-of-the-art methods are mainly based on point-level representation learning for regressing accurate point coordinates. However, this pipeline has limitations in obtaining element-level information and handling element-level failures, e.g. erroneous element shape or entanglement between elements. To tackle the above issues, we propose a simple yet effective HybrId framework named HIMap to sufficiently learn and interact both point-level and element-level information. Concretely, we introduce a hybrid representation called HIQuery to represent all map elements, and propose a point-element interactor to interactively extract and encode the hybrid information of elements, e.g. point position and element shape, into the HIQuery. Additionally, we present a point-element consistency constraint to enhance the consistency between the point-level and element-level information. Finally, the output point-element integrated HIQuery can be directly converted into map elements' class, point coordinates, and mask. We conduct extensive experiments and consistently outperform previous methods on both nuScenes and Argoverse2 datasets. Notably, our method achieves $77.8$ mAP on the nuScenes dataset, remarkably superior to previous SOTAs by $8.3$ mAP at least.
