Table of Contents
Fetching ...

ScalableMap: Scalable Map Learning for Online Long-Range Vectorized HD Map Construction

Jingyi Yu, Zizhao Zhang, Shengfu Xia, Jizhang Sang

TL;DR

This work addresses the challenge of online long-range vectorized HD map construction from multi-view cameras by exploiting the structural constraints of map elements. It introduces ScalableMap, combining structure-guided BEV feature extraction, a Hierarchical Sparse Map Representation (HSMR), a progressive DETR-inspired decoder, and progressive supervision to produce accurate, scalable vectorized maps at long ranges. The method achieves state-of-the-art $mAP$ improvements on nuScenes, notably $6.5$ mAP over prior methods, while maintaining real-time throughput at $18.3$ FPS, demonstrating strong practical viability for autonomous driving. Overall, ScalableMap advances long-range mapping by integrating structural priors, density-adaptive representations, and staged supervision to deliver robust, online vectorized HD maps from camera data.

Abstract

We propose a novel end-to-end pipeline for online long-range vectorized high-definition (HD) map construction using on-board camera sensors. The vectorized representation of HD maps, employing polylines and polygons to represent map elements, is widely used by downstream tasks. However, previous schemes designed with reference to dynamic object detection overlook the structural constraints within linear map elements, resulting in performance degradation in long-range scenarios. In this paper, we exploit the properties of map elements to improve the performance of map construction. We extract more accurate bird's eye view (BEV) features guided by their linear structure, and then propose a hierarchical sparse map representation to further leverage the scalability of vectorized map elements and design a progressive decoding mechanism and a supervision strategy based on this representation. Our approach, ScalableMap, demonstrates superior performance on the nuScenes dataset, especially in long-range scenarios, surpassing previous state-of-the-art model by 6.5 mAP while achieving 18.3 FPS. Code is available at https://github.com/jingy1yu/ScalableMap.

ScalableMap: Scalable Map Learning for Online Long-Range Vectorized HD Map Construction

TL;DR

This work addresses the challenge of online long-range vectorized HD map construction from multi-view cameras by exploiting the structural constraints of map elements. It introduces ScalableMap, combining structure-guided BEV feature extraction, a Hierarchical Sparse Map Representation (HSMR), a progressive DETR-inspired decoder, and progressive supervision to produce accurate, scalable vectorized maps at long ranges. The method achieves state-of-the-art improvements on nuScenes, notably mAP over prior methods, while maintaining real-time throughput at FPS, demonstrating strong practical viability for autonomous driving. Overall, ScalableMap advances long-range mapping by integrating structural priors, density-adaptive representations, and staged supervision to deliver robust, online vectorized HD maps from camera data.

Abstract

We propose a novel end-to-end pipeline for online long-range vectorized high-definition (HD) map construction using on-board camera sensors. The vectorized representation of HD maps, employing polylines and polygons to represent map elements, is widely used by downstream tasks. However, previous schemes designed with reference to dynamic object detection overlook the structural constraints within linear map elements, resulting in performance degradation in long-range scenarios. In this paper, we exploit the properties of map elements to improve the performance of map construction. We extract more accurate bird's eye view (BEV) features guided by their linear structure, and then propose a hierarchical sparse map representation to further leverage the scalability of vectorized map elements and design a progressive decoding mechanism and a supervision strategy based on this representation. Our approach, ScalableMap, demonstrates superior performance on the nuScenes dataset, especially in long-range scenarios, surpassing previous state-of-the-art model by 6.5 mAP while achieving 18.3 FPS. Code is available at https://github.com/jingy1yu/ScalableMap.
Paper Structure (35 sections, 4 equations, 8 figures, 6 tables)

This paper contains 35 sections, 4 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Overview of ScalableMap. (a) Structure-guided hybrid BEV feature extractor. (b) Hierarchical sparse map representation & Progressive decoder. (c) Progressive supervision.
  • Figure 2: Visualization of progressive polyline loss.
  • Figure 3: Visualization of qualitative results of ScalableMap in challenging scenes from nuScenes validation dataset. The left column is the surround views, the middle column is the inference results of the ScalableMap, the right column is corresponding ground truth. Green lines indicate boundaries, red lines indicate lane dividers, and blue lines indicate pedestrian crossings.
  • Figure 4: Visualization of prediction from three decoder layers of MapTR* and ScalableMap. The perception range along the Y-axis is $[-60.0m,60.0m]$. The light-colored lines on the image represent ground truth, while the dark-colored lines represent the inference results.
  • Figure 5: Visualization of convergence curves.
  • ...and 3 more figures