Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping
Hao Shan, Ruikai Li, Han Jiang, Yizhe Fan, Ziyang Yan, Bohan Li, Xiaoshuai Hao, Hao Zhao, Zhiyong Cui, Yilong Ren, Haiyang Yu
TL;DR
This paper tackles the overlooked problem of temporal stability in online HD mapping for autonomous driving by introducing a dedicated stability benchmark. It proposes a multi-dimensional framework that decomposes stability into Presence, Localization, and Shape, encapsulated in a single mean Average Stability (mAS) score, and demonstrates that mAS and traditional mAP behave largely independently across 42 models. Through extensive experiments, it reveals that representational choices—sensor modality, BEV encoder, temporal fusion, backbone, and training regimen—have nuanced and model-dependent effects on stability. The work advocates for co-designing architectures that jointly optimize accuracy and temporal stability and provides a public benchmark to accelerate progress toward safer, more reliable online mapping systems.
Abstract
As one of the fundamental modules in autonomous driving, online high-definition (HD) maps have attracted significant attention due to their cost-effectiveness and real-time capabilities. Since vehicles always cruise in highly dynamic environments, spatial displacement of onboard sensors inevitably causes shifts in real-time HD mapping results, and such instability poses fundamental challenges for downstream tasks. However, existing online map construction models tend to prioritize improving each frame's mapping accuracy, while the mapping stability has not yet been systematically studied. To fill this gap, this paper presents the first comprehensive benchmark for evaluating the temporal stability of online HD mapping models. We propose a multi-dimensional stability evaluation framework with novel metrics for Presence, Localization, and Shape Stability, integrated into a unified mean Average Stability (mAS) score. Extensive experiments on 42 models and variants show that accuracy (mAP) and stability (mAS) represent largely independent performance dimensions. We further analyze the impact of key model design choices on both criteria, identifying architectural and training factors that contribute to high accuracy, high stability, or both. To encourage broader focus on stability, we will release a public benchmark. Our work highlights the importance of treating temporal stability as a core evaluation criterion alongside accuracy, advancing the development of more reliable autonomous driving systems. The benchmark toolkit, code, and models will be available at https://stablehdmap.github.io/.
