Table of Contents
Fetching ...

MapGS: Generalizable Pretraining and Data Augmentation for Online Mapping via Novel View Synthesis

Hengyuan Zhang, David Paz, Yuliang Guo, Xinyu Huang, Henrik I. Christensen, Liu Ren

TL;DR

The paper addresses cross-sensor generalization in online mapping for autonomous driving by introducing MapGS, a data-generation framework that uses Gaussian splatting to reconstruct scenes and render images in a target sensor configuration. By creating nuAV2—reconstructed AV2 data rendered into nuScenes views—and leveraging pretraining and joint training, the approach improves cross-configuration generalization, accelerates training, and reduces labeling needs, achieving notable gains even with limited target-domain data. The key contributions include the nuAV2 dataset, a data-regeneration recipe, and demonstrated improvements (e.g., an 18% performance boost and the ability to surpass Oracle performance with only 25% of target data). This work enables data reuse across sensor setups, offering a practical path toward scalable, surround-view online mapping with reduced labeling burden.

Abstract

Online mapping reduces the reliance of autonomous vehicles on high-definition (HD) maps, significantly enhancing scalability. However, recent advancements often overlook cross-sensor configuration generalization, leading to performance degradation when models are deployed on vehicles with different camera intrinsics and extrinsics. With the rapid evolution of novel view synthesis methods, we investigate the extent to which these techniques can be leveraged to address the sensor configuration generalization challenge. We propose a novel framework leveraging Gaussian splatting to reconstruct scenes and render camera images in target sensor configurations. The target config sensor data, along with labels mapped to the target config, are used to train online mapping models. Our proposed framework on the nuScenes and Argoverse 2 datasets demonstrates a performance improvement of 18% through effective dataset augmentation, achieves faster convergence and efficient training, and exceeds state-of-the-art performance when using only 25% of the original training data. This enables data reuse and reduces the need for laborious data labeling. Project page at https://henryzhangzhy.github.io/mapgs.

MapGS: Generalizable Pretraining and Data Augmentation for Online Mapping via Novel View Synthesis

TL;DR

The paper addresses cross-sensor generalization in online mapping for autonomous driving by introducing MapGS, a data-generation framework that uses Gaussian splatting to reconstruct scenes and render images in a target sensor configuration. By creating nuAV2—reconstructed AV2 data rendered into nuScenes views—and leveraging pretraining and joint training, the approach improves cross-configuration generalization, accelerates training, and reduces labeling needs, achieving notable gains even with limited target-domain data. The key contributions include the nuAV2 dataset, a data-regeneration recipe, and demonstrated improvements (e.g., an 18% performance boost and the ability to surpass Oracle performance with only 25% of target data). This work enables data reuse across sensor setups, offering a practical path toward scalable, surround-view online mapping with reduced labeling burden.

Abstract

Online mapping reduces the reliance of autonomous vehicles on high-definition (HD) maps, significantly enhancing scalability. However, recent advancements often overlook cross-sensor configuration generalization, leading to performance degradation when models are deployed on vehicles with different camera intrinsics and extrinsics. With the rapid evolution of novel view synthesis methods, we investigate the extent to which these techniques can be leveraged to address the sensor configuration generalization challenge. We propose a novel framework leveraging Gaussian splatting to reconstruct scenes and render camera images in target sensor configurations. The target config sensor data, along with labels mapped to the target config, are used to train online mapping models. Our proposed framework on the nuScenes and Argoverse 2 datasets demonstrates a performance improvement of 18% through effective dataset augmentation, achieves faster convergence and efficient training, and exceeds state-of-the-art performance when using only 25% of the original training data. This enables data reuse and reduces the need for laborious data labeling. Project page at https://henryzhangzhy.github.io/mapgs.
Paper Structure (19 sections, 3 equations, 9 figures, 3 tables)

This paper contains 19 sections, 3 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Cross sensor data alignment. Online mapping algorithms struggle when deployed on a vehicle with different sensor configurations and require labeled data with the same sensor configuration. With the source sensor configuration images collected by Argoverse 2 (AV2) Argoverse2 data collection vehicles (top row), we propose to leverage Gaussian splatting to render images in the target nuScenes (NUSC) caesar_nuscenes_2020 sensor configuration (bottom row). The synthesized dataset, named nuAV2, is used to train online mapping algorithms to reduce the generalization gap using different training paradigms.
  • Figure 2: MapGS Pipeline. Deploying online mapping models on a different sensor configuration is challenging. MapGS proposes to leverage Street Gaussian (StreetGS) to reconstruct the scene, then render into images in target sensor configuration. We then train a model with this data and labels. Finally, we test the model in target sensor configuration.
  • Figure 3: PVG Distortion. While the reconstruction along the trajectory (top right) has high quality compared to groudtruth (top left), moving the camera away from the trajectory, such as 1 m backwards (bottom left) and 1 m left (bottom right) causes the quality to drop significantly.
  • Figure 4: nuAV2 Examples. nuAV2 renders reconstructed AV2 dataset in the NUSC sensor configuration.
  • Figure 5: Low data fine-tuning. Models pretrained with nuAV2 achieve high performance rapidly, often surpassing their baseline convergence performance, whereas pretraining with AV2 not only slows down training but also reduces overall performance.
  • ...and 4 more figures