MapRF: Weakly Supervised Online HD Map Construction via NeRF-Guided Self-Training
Hongyu Lyu, Thomas Monninger, Julie Stephany Berrio Perez, Mao Shan, Zhenxing Ming, Stewart Worrall
TL;DR
MapRF addresses the problem of online HD map construction without expensive 3D annotations by leveraging 2D image labels. It introduces a Map-Conditioned NeRF (MC-NeRF) to produce pseudo labels by jointly modeling geometry, semantics, and instances conditioned on map predictions, and a Map-to-Ray Matching (MRM) strategy to reduce concept drift during self-training. Through a three-stage pipeline—weak-label initialization, NeRF-based pseudo-label generation, and iterative self-training—MapRF achieves competitive results on Argoverse 2 and nuScenes, reaching about 50% mAP on BEV/3D maps without 3D labels and outperforming several 2D-label baselines. These findings demonstrate the practicality of scalable, cost-efficient, online HD map construction for autonomous driving and open avenues for further improvements with alternative 3D representations.
Abstract
Autonomous driving systems benefit from high-definition (HD) maps that provide critical information about road infrastructure. The online construction of HD maps offers a scalable approach to generate local maps from on-board sensors. However, existing methods typically rely on costly 3D map annotations for training, which limits their generalization and scalability across diverse driving environments. In this work, we propose MapRF, a weakly supervised framework that learns to construct 3D maps using only 2D image labels. To generate high-quality pseudo labels, we introduce a novel Neural Radiance Fields (NeRF) module conditioned on map predictions, which reconstructs view-consistent 3D geometry and semantics. These pseudo labels are then iteratively used to refine the map network in a self-training manner, enabling progressive improvement without additional supervision. Furthermore, to mitigate error accumulation during self-training, we propose a Map-to-Ray Matching strategy that aligns map predictions with camera rays derived from 2D labels. Extensive experiments on the Argoverse 2 and nuScenes datasets demonstrate that MapRF achieves performance comparable to fully supervised methods, attaining around 75% of the baseline while surpassing several approaches using only 2D labels. This highlights the potential of MapRF to enable scalable and cost-effective online HD map construction for autonomous driving.
