Table of Contents
Fetching ...

Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

Saining Zhang, Baijun Ye, Xiaoxue Chen, Yuantao Chen, Zongzheng Zhang, Cheng Peng, Yongliang Shi, Hao Zhao

TL;DR

The paper tackles the challenge of realistic, large-scale road scene rendering by introducing cross-view uncertainty to fuse drone aerial imagery with ground-view 3D-Gaussian Splatting (3D-GS). An ensemble-based ground-view uncertainty is projected into the aerial domain to weight aerial pixels during training, mitigating misaligned or non-overlapping information. A new aerial-ground road scene dataset is synthesized with AirSim/Unreal Engine to enable rigorous evaluation. Empirical results show improved held-out view quality and robust performance under view shifts, surpassing ground-only training and prior baselines, with strong implications for autonomous driving simulation.

Abstract

Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuitively, the data from the drone's perspective can provide a complementary viewpoint for the data from the ground vehicle's perspective, enhancing the completeness of scene reconstruction and rendering. However, training naively with aerial and ground images, which exhibit large view disparity, poses a significant convergence challenge for 3D-GS, and does not demonstrate remarkable improvements in performance on road views. In order to enhance the novel view synthesis of road views and to effectively use the aerial information, we design an uncertainty-aware training method that allows aerial images to assist in the synthesis of areas where ground images have poor learning outcomes instead of weighting all pixels equally in 3D-GS training like prior work did. We are the first to introduce the cross-view uncertainty to 3D-GS by matching the car-view ensemble-based rendering uncertainty to aerial images, weighting the contribution of each pixel to the training process. Additionally, to systematically quantify evaluation metrics, we assemble a high-quality synthesized dataset comprising both aerial and ground images for road scenes.

Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

TL;DR

The paper tackles the challenge of realistic, large-scale road scene rendering by introducing cross-view uncertainty to fuse drone aerial imagery with ground-view 3D-Gaussian Splatting (3D-GS). An ensemble-based ground-view uncertainty is projected into the aerial domain to weight aerial pixels during training, mitigating misaligned or non-overlapping information. A new aerial-ground road scene dataset is synthesized with AirSim/Unreal Engine to enable rigorous evaluation. Empirical results show improved held-out view quality and robust performance under view shifts, surpassing ground-only training and prior baselines, with strong implications for autonomous driving simulation.

Abstract

Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuitively, the data from the drone's perspective can provide a complementary viewpoint for the data from the ground vehicle's perspective, enhancing the completeness of scene reconstruction and rendering. However, training naively with aerial and ground images, which exhibit large view disparity, poses a significant convergence challenge for 3D-GS, and does not demonstrate remarkable improvements in performance on road views. In order to enhance the novel view synthesis of road views and to effectively use the aerial information, we design an uncertainty-aware training method that allows aerial images to assist in the synthesis of areas where ground images have poor learning outcomes instead of weighting all pixels equally in 3D-GS training like prior work did. We are the first to introduce the cross-view uncertainty to 3D-GS by matching the car-view ensemble-based rendering uncertainty to aerial images, weighting the contribution of each pixel to the training process. Additionally, to systematically quantify evaluation metrics, we assemble a high-quality synthesized dataset comprising both aerial and ground images for road scenes.
Paper Structure (14 sections, 9 equations, 6 figures, 2 tables)

This paper contains 14 sections, 9 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Qualitative results of our Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty and several baseline methods. The dataset is 1.6m test set of New York City. The quality improvement is highlighted by boxes.
  • Figure 2: General view of the synthesized dataset. 5°d means 5 degrees downward.
  • Figure 3: Overview of Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty. We first adopt an ensemble-based rendering uncertainty to quantify the learning outcomes of 3D Gaussians on ground images. Next, the ground uncertainty is projected to the air to build the cross-view uncertainty. Subsequently, we introduce the cross-view uncertainty to the training of 3D Gaussians as weight for each pixel of aerial images in the loss function, together with the original rendering loss of 3D-GS for ground images.
  • Figure 4: Results for training with ground or aerial and ground images on various models. (G), (A+G) are training with ground or aerial and ground images.
  • Figure 5: The first row shows the visualization of the cross-view uncertainty, and the second row shows the corresponding aerial data.
  • ...and 1 more figures