Table of Contents
Fetching ...

HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes

Zhuopeng Li, Yilin Zhang, Chenming Wu, Jianke Zhu, Liangjun Zhang

TL;DR

HO-Gaussian addresses the limitations of 3D Gaussian Splatting in urban scenes by removing reliance on SfM point initialization and introducing a grid-based volume to guide Gaussian optimization. It introduces Gaussian Position Encoding and Gaussian Directional Encoding to efficiently represent geometry and view-dependent color, plus Neural Warping to ensure multi-camera consistency, together with Point Densitification to fill low-texture and distant regions. The method jointly optimizes Gaussian parameters and grid-volume attributes with a hybrid loss, achieving real-time, photo-realistic novel-view rendering on Waymo and Argoverse without heavy storage demands. Experimental results demonstrate substantial improvements over both NeRF-based urban methods and SfM/LiDAR-reliant 3DGS baselines, with favorable texture and geometry quality and improved efficiency for large-scale urban scenes.

Abstract

The rapid growth of 3D Gaussian Splatting (3DGS) has revolutionized neural rendering, enabling real-time production of high-quality renderings. However, the previous 3DGS-based methods have limitations in urban scenes due to reliance on initial Structure-from-Motion(SfM) points and difficulties in rendering distant, sky and low-texture areas. To overcome these challenges, we propose a hybrid optimization method named HO-Gaussian, which combines a grid-based volume with the 3DGS pipeline. HO-Gaussian eliminates the dependency on SfM point initialization, allowing for rendering of urban scenes, and incorporates the Point Densitification to enhance rendering quality in problematic regions during training. Furthermore, we introduce Gaussian Direction Encoding as an alternative for spherical harmonics in the rendering pipeline, which enables view-dependent color representation. To account for multi-camera systems, we introduce neural warping to enhance object consistency across different cameras. Experimental results on widely used autonomous driving datasets demonstrate that HO-Gaussian achieves photo-realistic rendering in real-time on multi-camera urban datasets.

HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes

TL;DR

HO-Gaussian addresses the limitations of 3D Gaussian Splatting in urban scenes by removing reliance on SfM point initialization and introducing a grid-based volume to guide Gaussian optimization. It introduces Gaussian Position Encoding and Gaussian Directional Encoding to efficiently represent geometry and view-dependent color, plus Neural Warping to ensure multi-camera consistency, together with Point Densitification to fill low-texture and distant regions. The method jointly optimizes Gaussian parameters and grid-volume attributes with a hybrid loss, achieving real-time, photo-realistic novel-view rendering on Waymo and Argoverse without heavy storage demands. Experimental results demonstrate substantial improvements over both NeRF-based urban methods and SfM/LiDAR-reliant 3DGS baselines, with favorable texture and geometry quality and improved efficiency for large-scale urban scenes.

Abstract

The rapid growth of 3D Gaussian Splatting (3DGS) has revolutionized neural rendering, enabling real-time production of high-quality renderings. However, the previous 3DGS-based methods have limitations in urban scenes due to reliance on initial Structure-from-Motion(SfM) points and difficulties in rendering distant, sky and low-texture areas. To overcome these challenges, we propose a hybrid optimization method named HO-Gaussian, which combines a grid-based volume with the 3DGS pipeline. HO-Gaussian eliminates the dependency on SfM point initialization, allowing for rendering of urban scenes, and incorporates the Point Densitification to enhance rendering quality in problematic regions during training. Furthermore, we introduce Gaussian Direction Encoding as an alternative for spherical harmonics in the rendering pipeline, which enables view-dependent color representation. To account for multi-camera systems, we introduce neural warping to enhance object consistency across different cameras. Experimental results on widely used autonomous driving datasets demonstrate that HO-Gaussian achieves photo-realistic rendering in real-time on multi-camera urban datasets.
Paper Structure (24 sections, 12 equations, 5 figures, 3 tables)

This paper contains 24 sections, 12 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Illustration of 3D Gaussian Splatting(3DGS) kerbl20233d and HO-Gaussian(Ours). Compared with 3DGS initialized by SfM points, our method has richer Gaussian geometric information in low-texture, sky and distant areas, and shows significant improvement in the task of synthesizing novel views.
  • Figure 2: Pipeline. The hybrid optimization starts from a grid-based volume, creating a set of Gaussian points, with the grid-based volume and Gaussian pipeline iteratively optimized. Subsequently, at regular intervals, point densification provides new positions to the Gaussian pipeline to populate problematic regions. Here, view-dependent color is encoded by the Gaussian Directional Encoding, replacing spherical harmonics. Finally, we supply virtual viewpoints to the Gaussian pipeline through neural warping, enhancing consistent appearance and geometry for multi-camera scenes.
  • Figure 3: Comparing densification strategies of 3DGS and our HO-Gaussian. The cloning and splitting strategies of 3DGS can effectively optimize the Gaussian distribution near the initial SfM points. However, they fail to work in low-texture or distant areas where the positions of initial points are missing. HO-Gaussian is capable of learning and optimizing Gaussian distributions beyond the initial points. First, Point Densitification supplies the Gaussian pipeline with missing points within viewpoints, preventing the projection of empty 2D splats. Subsequently, Neural Warping introduces virtual viewpoints, thereby covering more occluded points. Finally, clone and split operations are employed to fine-tune the positions of inaccurate splats, and Gaussian splats with opacity values $\alpha$ below a threshold $\epsilon_{\alpha}$ are removed.
  • Figure 4: Comparative results of novel view synthesis on Argoverse datasets. Please zoom in to view the detailed results.
  • Figure 5: Visualization of scene geometry(a) and texture quality synthesized by LocalRF and our method(b). Please zoom in to view detailed results.