Table of Contents
Fetching ...

MapRF: Weakly Supervised Online HD Map Construction via NeRF-Guided Self-Training

Hongyu Lyu, Thomas Monninger, Julie Stephany Berrio Perez, Mao Shan, Zhenxing Ming, Stewart Worrall

TL;DR

MapRF addresses the problem of online HD map construction without expensive 3D annotations by leveraging 2D image labels. It introduces a Map-Conditioned NeRF (MC-NeRF) to produce pseudo labels by jointly modeling geometry, semantics, and instances conditioned on map predictions, and a Map-to-Ray Matching (MRM) strategy to reduce concept drift during self-training. Through a three-stage pipeline—weak-label initialization, NeRF-based pseudo-label generation, and iterative self-training—MapRF achieves competitive results on Argoverse 2 and nuScenes, reaching about 50% mAP on BEV/3D maps without 3D labels and outperforming several 2D-label baselines. These findings demonstrate the practicality of scalable, cost-efficient, online HD map construction for autonomous driving and open avenues for further improvements with alternative 3D representations.

Abstract

Autonomous driving systems benefit from high-definition (HD) maps that provide critical information about road infrastructure. The online construction of HD maps offers a scalable approach to generate local maps from on-board sensors. However, existing methods typically rely on costly 3D map annotations for training, which limits their generalization and scalability across diverse driving environments. In this work, we propose MapRF, a weakly supervised framework that learns to construct 3D maps using only 2D image labels. To generate high-quality pseudo labels, we introduce a novel Neural Radiance Fields (NeRF) module conditioned on map predictions, which reconstructs view-consistent 3D geometry and semantics. These pseudo labels are then iteratively used to refine the map network in a self-training manner, enabling progressive improvement without additional supervision. Furthermore, to mitigate error accumulation during self-training, we propose a Map-to-Ray Matching strategy that aligns map predictions with camera rays derived from 2D labels. Extensive experiments on the Argoverse 2 and nuScenes datasets demonstrate that MapRF achieves performance comparable to fully supervised methods, attaining around 75% of the baseline while surpassing several approaches using only 2D labels. This highlights the potential of MapRF to enable scalable and cost-effective online HD map construction for autonomous driving.

MapRF: Weakly Supervised Online HD Map Construction via NeRF-Guided Self-Training

TL;DR

MapRF addresses the problem of online HD map construction without expensive 3D annotations by leveraging 2D image labels. It introduces a Map-Conditioned NeRF (MC-NeRF) to produce pseudo labels by jointly modeling geometry, semantics, and instances conditioned on map predictions, and a Map-to-Ray Matching (MRM) strategy to reduce concept drift during self-training. Through a three-stage pipeline—weak-label initialization, NeRF-based pseudo-label generation, and iterative self-training—MapRF achieves competitive results on Argoverse 2 and nuScenes, reaching about 50% mAP on BEV/3D maps without 3D labels and outperforming several 2D-label baselines. These findings demonstrate the practicality of scalable, cost-efficient, online HD map construction for autonomous driving and open avenues for further improvements with alternative 3D representations.

Abstract

Autonomous driving systems benefit from high-definition (HD) maps that provide critical information about road infrastructure. The online construction of HD maps offers a scalable approach to generate local maps from on-board sensors. However, existing methods typically rely on costly 3D map annotations for training, which limits their generalization and scalability across diverse driving environments. In this work, we propose MapRF, a weakly supervised framework that learns to construct 3D maps using only 2D image labels. To generate high-quality pseudo labels, we introduce a novel Neural Radiance Fields (NeRF) module conditioned on map predictions, which reconstructs view-consistent 3D geometry and semantics. These pseudo labels are then iteratively used to refine the map network in a self-training manner, enabling progressive improvement without additional supervision. Furthermore, to mitigate error accumulation during self-training, we propose a Map-to-Ray Matching strategy that aligns map predictions with camera rays derived from 2D labels. Extensive experiments on the Argoverse 2 and nuScenes datasets demonstrate that MapRF achieves performance comparable to fully supervised methods, attaining around 75% of the baseline while surpassing several approaches using only 2D labels. This highlights the potential of MapRF to enable scalable and cost-effective online HD map construction for autonomous driving.

Paper Structure

This paper contains 11 sections, 12 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Motivation for MapRF. Compared to existing methods, MapRF learns from accessible 2D image labels to construct 3D HD maps online. We generate pseudo labels through the proposed NeRF module and use them for self-training. This design reduces data collection and annotation costs, thereby improving scalability.
  • Figure 2: Overall framework of MapRF. The framework learns to construct 3D HD maps online using only 2D image annotations. We first train an initial map model with weak labels generated via IPM. We then optimize a Map-Conditioned NeRF (MC-NeRF) with multi-view labels to generate pseudo labels. We iteratively retrain the map model with pseudo labels and re-optimize MC-NeRF, forming a self-training loop. We further employ a Map-to-Ray Matching (MRM) strategy to mitigate the risk of concept drift during self-training.
  • Figure 3: Weak vs. pseudo labels. When roads exhibit slopes or elevation changes, projecting 2D labels onto a single plane introduces positional errors and geometric distortions. Although heuristic constraints can alleviate these issues, our pseudo labels yield representations that are more geometrically accurate.
  • Figure 4: Analysis of Self-Training Results. Both pseudo-label quality and model performance improve progressively across self-training rounds, revealing a positive feedback loop.
  • Figure 5: Qualitative Results. MapRF produces geometrically accurate maps in diverse scenes. Multi-view images are overlaid with projected predictions from MapRF (round 3). Orange, blue, and green denote lane dividers, pedestrian crossings, and road boundaries, respectively.