Table of Contents
Fetching ...

GaRField++: Reinforced Gaussian Radiance Fields for Large-Scale 3D Scene Reconstruction

Hanyue Zhang, Zhiliu Yang, Xinhe Zuo, Yuxin Tong, Ying Long, Chen Liu

TL;DR

Evaluation of Mill19, Urban3D, and MatrixCity datasets shows that the proposed 3DGS method consistently generates more high-fidelity rendering results than state-of-the-art methods of large-scale scene reconstruction.

Abstract

This paper proposes a novel framework for large-scale scene reconstruction based on 3D Gaussian splatting (3DGS) and aims to address the scalability and accuracy challenges faced by existing methods. For tackling the scalability issue, we split the large scene into multiple cells, and the candidate point-cloud and camera views of each cell are correlated through a visibility-based camera selection and a progressive point-cloud extension. To reinforce the rendering quality, three highlighted improvements are made in comparison with vanilla 3DGS, which are a strategy of the ray-Gaussian intersection and the novel Gaussians density control for learning efficiency, an appearance decoupling module based on ConvKAN network to solve uneven lighting conditions in large-scale scenes, and a refined final loss with the color loss, the depth distortion loss, and the normal consistency loss. Finally, the seamless stitching procedure is executed to merge the individual Gaussian radiance field for novel view synthesis across different cells. Evaluation of Mill19, Urban3D, and MatrixCity datasets shows that our method consistently generates more high-fidelity rendering results than state-of-the-art methods of large-scale scene reconstruction. We further validate the generalizability of the proposed approach by rendering on self-collected video clips recorded by a commercial drone.

GaRField++: Reinforced Gaussian Radiance Fields for Large-Scale 3D Scene Reconstruction

TL;DR

Evaluation of Mill19, Urban3D, and MatrixCity datasets shows that the proposed 3DGS method consistently generates more high-fidelity rendering results than state-of-the-art methods of large-scale scene reconstruction.

Abstract

This paper proposes a novel framework for large-scale scene reconstruction based on 3D Gaussian splatting (3DGS) and aims to address the scalability and accuracy challenges faced by existing methods. For tackling the scalability issue, we split the large scene into multiple cells, and the candidate point-cloud and camera views of each cell are correlated through a visibility-based camera selection and a progressive point-cloud extension. To reinforce the rendering quality, three highlighted improvements are made in comparison with vanilla 3DGS, which are a strategy of the ray-Gaussian intersection and the novel Gaussians density control for learning efficiency, an appearance decoupling module based on ConvKAN network to solve uneven lighting conditions in large-scale scenes, and a refined final loss with the color loss, the depth distortion loss, and the normal consistency loss. Finally, the seamless stitching procedure is executed to merge the individual Gaussian radiance field for novel view synthesis across different cells. Evaluation of Mill19, Urban3D, and MatrixCity datasets shows that our method consistently generates more high-fidelity rendering results than state-of-the-art methods of large-scale scene reconstruction. We further validate the generalizability of the proposed approach by rendering on self-collected video clips recorded by a commercial drone.
Paper Structure (27 sections, 5 equations, 6 figures, 3 tables)

This paper contains 27 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Rendered RGB images and corresponding rendered depth normals from our GaRField++ framework on the self-collected data. Randomly rendered images from multiple views of the large-scale scenes are complete, smooth and detailed. This is achieved by constructing a divide-and-conquer Gaussian radiance field, which is reinforced by precisely modeling the color and opacity information and improving the training efficiency. The data is collected from the monocular camera of a DJI drone.
  • Figure 2: Overview of our GaRField++ framework.Scene Partitioning: We implement a sparse reconstruction based on the Structure-from-Motion (SfM) method, generating a point cloud and estimating the initial camera pose for each image. Concurrently, we performed Manhattan alignment on the point cloud. Subsequently, we employ a coordinate-based regionalization and a visibility-based view selection strategy to split the point cloud. Cell Rendering: By leveraging the ray-Gaussian intersection model, enhanced Gaussian density control, and convolution KAN (Kernelized Attention Network)-based decoupled appearance modeling, we obtained the reconstruction results for each partition. Optimization: We employ a newly constructed loss function to optimize the training process. This loss function encompasses depth distortion loss, normal consistency loss, and color loss, thereby enhancing the accuracy and efficiency of large-scale reconstruction. Novel View Synthesis: we seamlessly stitched together the separate Gaussian fields from various cells to obtain a complete Gaussian field for the large-scale scene. This step enables the entire large-scale area model to support cross-border rendering, providing the possibility for the generation of novel view synthesis.
  • Figure 3: Architecture of our ConvKAN-based decoupled appearance modeling.
  • Figure 4: Qualitative Comparison with SOTA. The first row represents the Rubble scenario, the second row manifests the building scenario, and the third and fourth rows showcase small_city scenes from the MatrixCity dataset. The experiment demonstrates superior capability of our GaRField++ framework in preserving color fidelity in rendered images, which is more closely resembling to the original images. Specifically, the region of interests are zoomed in with red box. (Best viewed with zoom-in.)
  • Figure 5: Comparison of our method with 3DGS on Self-collected data. Fig. \ref{['fig:ablation_self']}.a corresponds to the original image obtained from Campus-YNU scenario Fig. \ref{['fig:ablation_self']}.b illustrates the image rendered using 3DGS, where the solar panels and the trees exhibit a degree of blurriness. Fig. \ref{['fig:ablation_self']}.c demonstrates the image rendered with our proposed method, showing a decent enhancement in the clarity of the solar panels and the trees. (Best viewed with zoom-in.)
  • ...and 1 more figures