Table of Contents
Fetching ...

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

Lihan Jiang, Kerui Ren, Mulin Yu, Linning Xu, Junting Dong, Tao Lu, Feng Zhao, Dahua Lin, Bo Dai

TL;DR

Horizon-GS addresses the need for unified high-fidelity reconstruction of large-scale scenes from both aerial and street views, a setting where prior methods specialized to a single domain struggle to provide seamless free-view rendering. It introduces a two-stage coarse-to-fine training protocol, a balanced camera sampling strategy, and a multi-resolution LOD scheme on top of 3D Gaussian Splatting (with 2D variants for geometry) to reconcile the cross-view discrepancies. A new cross-view dataset with synthetic and real scenes supports training and evaluation, and extensive experiments show state-of-the-art performance in both rendering quality and surface reconstruction for large-scale urban scenes. The approach enables scalable rendering and reconstruction with real-time performance in large environments, offering a practical path toward immersive cross-view experiences in digital twins, autonomous navigation, and VR/AR applications.

Abstract

Seamless integration of both aerial and street view images remains a significant challenge in neural scene reconstruction and rendering. Existing methods predominantly focus on single domain, limiting their applications in immersive environments, which demand extensive free view exploration with large view changes both horizontally and vertically. We introduce Horizon-GS, a novel approach built upon Gaussian Splatting techniques, tackles the unified reconstruction and rendering for aerial and street views. Our method addresses the key challenges of combining these perspectives with a new training strategy, overcoming viewpoint discrepancies to generate high-fidelity scenes. We also curate a high-quality aerial-to-ground views dataset encompassing both synthetic and real-world scene to advance further research. Experiments across diverse urban scene datasets confirm the effectiveness of our method.

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

TL;DR

Horizon-GS addresses the need for unified high-fidelity reconstruction of large-scale scenes from both aerial and street views, a setting where prior methods specialized to a single domain struggle to provide seamless free-view rendering. It introduces a two-stage coarse-to-fine training protocol, a balanced camera sampling strategy, and a multi-resolution LOD scheme on top of 3D Gaussian Splatting (with 2D variants for geometry) to reconcile the cross-view discrepancies. A new cross-view dataset with synthetic and real scenes supports training and evaluation, and extensive experiments show state-of-the-art performance in both rendering quality and surface reconstruction for large-scale urban scenes. The approach enables scalable rendering and reconstruction with real-time performance in large environments, offering a practical path toward immersive cross-view experiences in digital twins, autonomous navigation, and VR/AR applications.

Abstract

Seamless integration of both aerial and street view images remains a significant challenge in neural scene reconstruction and rendering. Existing methods predominantly focus on single domain, limiting their applications in immersive environments, which demand extensive free view exploration with large view changes both horizontally and vertically. We introduce Horizon-GS, a novel approach built upon Gaussian Splatting techniques, tackles the unified reconstruction and rendering for aerial and street views. Our method addresses the key challenges of combining these perspectives with a new training strategy, overcoming viewpoint discrepancies to generate high-fidelity scenes. We also curate a high-quality aerial-to-ground views dataset encompassing both synthetic and real-world scene to advance further research. Experiments across diverse urban scene datasets confirm the effectiveness of our method.

Paper Structure

This paper contains 44 sections, 4 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Horizon-GS enables high-quality rendering and reconstruction of aerial-to-ground scenes with unprecedented fidelity across scales, supporting drastic view changes. The colored camera trajectories in the center illustrate the novel viewpoints, while the reconstructed mesh is overlaid on the scene. The surrounding images show the corresponding predicted images for each viewpoint.
  • Figure 2: Pipeline of Horizon-GS. We divide large-scale scenes into chunks. For each chunk, we initialize LOD-structured anchors and conduct the coarse-to-fine training process. Specifically, the coarse stage reconstructs the overall scene, while the fine stage enhances street view details (highlighted in purple). We can derive RGB, depth, and normal images by utilizing different primitive attributes (2D/3D Gaussians) with a single shared underlying structure.
  • Figure 3: (a) Test curves for PSNR and the number of Gaussian primitives across aerial only, street only, and merged views from 15k to 100k iterations on our proposed Road scene. (b) Gradient conflicts restrict the optimization of Gaussian primitives because street views tend to exclude blue Gaussian primitives due to their lower contribution, while aerial views do the opposite.
  • Figure 4: Visualization of our constructed dataset. All the $7$ scenes contain calibrated aerial and street view images. We illustrate the scenes with the point clouds and the corresponding image capture poses. The trajectory of aerial views is shown in purple, while street views are represented in yellow. Our dataset contains 5 synthetic scenes (a-e) and 2 real scenes (f-g).
  • Figure 5: Qualiative comparisons of Horizon-GS against baselines huang20242dkerbl20233dlu2023scaffoldkerbl2024hierarchical across (a) small-scale and (b) large-scale scenes.
  • ...and 3 more figures