Table of Contents
Fetching ...

Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping

Hao Shan, Ruikai Li, Han Jiang, Yizhe Fan, Ziyang Yan, Bohan Li, Xiaoshuai Hao, Hao Zhao, Zhiyong Cui, Yilong Ren, Haiyang Yu

TL;DR

This paper tackles the overlooked problem of temporal stability in online HD mapping for autonomous driving by introducing a dedicated stability benchmark. It proposes a multi-dimensional framework that decomposes stability into Presence, Localization, and Shape, encapsulated in a single mean Average Stability (mAS) score, and demonstrates that mAS and traditional mAP behave largely independently across 42 models. Through extensive experiments, it reveals that representational choices—sensor modality, BEV encoder, temporal fusion, backbone, and training regimen—have nuanced and model-dependent effects on stability. The work advocates for co-designing architectures that jointly optimize accuracy and temporal stability and provides a public benchmark to accelerate progress toward safer, more reliable online mapping systems.

Abstract

As one of the fundamental modules in autonomous driving, online high-definition (HD) maps have attracted significant attention due to their cost-effectiveness and real-time capabilities. Since vehicles always cruise in highly dynamic environments, spatial displacement of onboard sensors inevitably causes shifts in real-time HD mapping results, and such instability poses fundamental challenges for downstream tasks. However, existing online map construction models tend to prioritize improving each frame's mapping accuracy, while the mapping stability has not yet been systematically studied. To fill this gap, this paper presents the first comprehensive benchmark for evaluating the temporal stability of online HD mapping models. We propose a multi-dimensional stability evaluation framework with novel metrics for Presence, Localization, and Shape Stability, integrated into a unified mean Average Stability (mAS) score. Extensive experiments on 42 models and variants show that accuracy (mAP) and stability (mAS) represent largely independent performance dimensions. We further analyze the impact of key model design choices on both criteria, identifying architectural and training factors that contribute to high accuracy, high stability, or both. To encourage broader focus on stability, we will release a public benchmark. Our work highlights the importance of treating temporal stability as a core evaluation criterion alongside accuracy, advancing the development of more reliable autonomous driving systems. The benchmark toolkit, code, and models will be available at https://stablehdmap.github.io/.

Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping

TL;DR

This paper tackles the overlooked problem of temporal stability in online HD mapping for autonomous driving by introducing a dedicated stability benchmark. It proposes a multi-dimensional framework that decomposes stability into Presence, Localization, and Shape, encapsulated in a single mean Average Stability (mAS) score, and demonstrates that mAS and traditional mAP behave largely independently across 42 models. Through extensive experiments, it reveals that representational choices—sensor modality, BEV encoder, temporal fusion, backbone, and training regimen—have nuanced and model-dependent effects on stability. The work advocates for co-designing architectures that jointly optimize accuracy and temporal stability and provides a public benchmark to accelerate progress toward safer, more reliable online mapping systems.

Abstract

As one of the fundamental modules in autonomous driving, online high-definition (HD) maps have attracted significant attention due to their cost-effectiveness and real-time capabilities. Since vehicles always cruise in highly dynamic environments, spatial displacement of onboard sensors inevitably causes shifts in real-time HD mapping results, and such instability poses fundamental challenges for downstream tasks. However, existing online map construction models tend to prioritize improving each frame's mapping accuracy, while the mapping stability has not yet been systematically studied. To fill this gap, this paper presents the first comprehensive benchmark for evaluating the temporal stability of online HD mapping models. We propose a multi-dimensional stability evaluation framework with novel metrics for Presence, Localization, and Shape Stability, integrated into a unified mean Average Stability (mAS) score. Extensive experiments on 42 models and variants show that accuracy (mAP) and stability (mAS) represent largely independent performance dimensions. We further analyze the impact of key model design choices on both criteria, identifying architectural and training factors that contribute to high accuracy, high stability, or both. To encourage broader focus on stability, we will release a public benchmark. Our work highlights the importance of treating temporal stability as a core evaluation criterion alongside accuracy, advancing the development of more reliable autonomous driving systems. The benchmark toolkit, code, and models will be available at https://stablehdmap.github.io/.

Paper Structure

This paper contains 55 sections, 10 equations, 9 figures, 19 tables, 7 algorithms.

Figures (9)

  • Figure 1: Evaluating trustworthiness of online mapping models using human judgment, traditional mAP, and our mAS metric. In each case, the standard accuracy metric (mAP) fails to align with human judgment because it evaluates only single-frame precision, disregarding stability across time. To address this limitation, we propose the first stability benchmark for online vectorized map construction and present a large-scale analysis of contemporary models.
  • Figure 2: The Impact of Unstable Map Elements on Downstream Tasks. In Scenario A, the ego vehicle attempts to overtake, but the forward lane divider suddenly disappears during the maneuver, causing the ego vehicle to steer toward the curb. In Scenario B, another vehicle attempts to change lanes, but due to flickering lane dividers in the ego vehicle's perception, the ego vehicle interprets the other vehicle's action as a collision course.
  • Figure 3: Radar chart for Basic HD map constructors covering eight evaluation metrics. The axes of the radar chart correspond to: #1 mAS, #2 Shape, #3 Loc, #4 Presence, #5 mAP, #6 Inference Memory Cost, #7 Parameter Count, #8 FPS.
  • Figure 4: The correlations between the single-frame accuracy metrics mAP and the stability metrics Presence, Loc, Shape, and mAS. The bubble size represents the model's parameter count.
  • Figure 5: The dual effect of temporal fusion on MapTR with different BEV encoders.
  • ...and 4 more figures