Table of Contents
Fetching ...

StructGS: Adaptive Spherical Harmonics and Rendering Enhancements for Superior 3D Gaussian Splatting

Zexu Huang, Min Xu, Stuart Perry

TL;DR

StructGS addresses key limitations in 3D Gaussian Splatting-based novel-view synthesis by introducing non-local structural supervision through a patch-based SSIM loss, a dynamic spherical harmonics initialisation strategy sensitive to opacity and inter-sphere distance, and a pre-trained Multi-scale Residual Network for super-resolution rendering. The method combines these with a tailored training loss that switches from D-SSIM to P-SSIM after a threshold iteration, enabling efficient early training and high-fidelity final outputs. Empirical results across multiple datasets show state-of-the-art PSNR, SSIM, and LPIPS scores, with notable improvements in detail and reduction of artifacts, even when training with low-resolution inputs. The approach enables high-resolution rendering from low-resolution data and offers practical benefits for real-world 3D reconstruction and rendering pipelines.

Abstract

Recent advancements in 3D reconstruction coupled with neural rendering techniques have greatly improved the creation of photo-realistic 3D scenes, influencing both academic research and industry applications. The technique of 3D Gaussian Splatting and its variants incorporate the strengths of both primitive-based and volumetric representations, achieving superior rendering quality. While 3D Geometric Scattering (3DGS) and its variants have advanced the field of 3D representation, they fall short in capturing the stochastic properties of non-local structural information during the training process. Additionally, the initialisation of spherical functions in 3DGS-based methods often fails to engage higher-order terms in early training rounds, leading to unnecessary computational overhead as training progresses. Furthermore, current 3DGS-based approaches require training on higher resolution images to render higher resolution outputs, significantly increasing memory demands and prolonging training durations. We introduce StructGS, a framework that enhances 3D Gaussian Splatting (3DGS) for improved novel-view synthesis in 3D reconstruction. StructGS innovatively incorporates a patch-based SSIM loss, dynamic spherical harmonics initialisation and a Multi-scale Residual Network (MSRN) to address the above-mentioned limitations, respectively. Our framework significantly reduces computational redundancy, enhances detail capture and supports high-resolution rendering from low-resolution inputs. Experimentally, StructGS demonstrates superior performance over state-of-the-art (SOTA) models, achieving higher quality and more detailed renderings with fewer artifacts.

StructGS: Adaptive Spherical Harmonics and Rendering Enhancements for Superior 3D Gaussian Splatting

TL;DR

StructGS addresses key limitations in 3D Gaussian Splatting-based novel-view synthesis by introducing non-local structural supervision through a patch-based SSIM loss, a dynamic spherical harmonics initialisation strategy sensitive to opacity and inter-sphere distance, and a pre-trained Multi-scale Residual Network for super-resolution rendering. The method combines these with a tailored training loss that switches from D-SSIM to P-SSIM after a threshold iteration, enabling efficient early training and high-fidelity final outputs. Empirical results across multiple datasets show state-of-the-art PSNR, SSIM, and LPIPS scores, with notable improvements in detail and reduction of artifacts, even when training with low-resolution inputs. The approach enables high-resolution rendering from low-resolution data and offers practical benefits for real-world 3D reconstruction and rendering pipelines.

Abstract

Recent advancements in 3D reconstruction coupled with neural rendering techniques have greatly improved the creation of photo-realistic 3D scenes, influencing both academic research and industry applications. The technique of 3D Gaussian Splatting and its variants incorporate the strengths of both primitive-based and volumetric representations, achieving superior rendering quality. While 3D Geometric Scattering (3DGS) and its variants have advanced the field of 3D representation, they fall short in capturing the stochastic properties of non-local structural information during the training process. Additionally, the initialisation of spherical functions in 3DGS-based methods often fails to engage higher-order terms in early training rounds, leading to unnecessary computational overhead as training progresses. Furthermore, current 3DGS-based approaches require training on higher resolution images to render higher resolution outputs, significantly increasing memory demands and prolonging training durations. We introduce StructGS, a framework that enhances 3D Gaussian Splatting (3DGS) for improved novel-view synthesis in 3D reconstruction. StructGS innovatively incorporates a patch-based SSIM loss, dynamic spherical harmonics initialisation and a Multi-scale Residual Network (MSRN) to address the above-mentioned limitations, respectively. Our framework significantly reduces computational redundancy, enhances detail capture and supports high-resolution rendering from low-resolution inputs. Experimentally, StructGS demonstrates superior performance over state-of-the-art (SOTA) models, achieving higher quality and more detailed renderings with fewer artifacts.

Paper Structure

This paper contains 18 sections, 17 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: StructGS: We present a framework that produces high-quality renderings from a set of single-view camera images. Following scene reconstruction, our approach allows for rapid rendering at resolutions higher than that of the input image. To achieve this, we leveraged a patch SSIM loss and a total variation (TV) loss regularizer, which effectively capture nonlocal structural information and enhance image smoothness. Additionally, we proposed a dynamic adjustment strategy for spherical harmonics based on the opacity and distance of Gaussian spheres. We also integrated a pre-trained Multi-scale Residual Network to facilitate super-resolution rendering..
  • Figure 2: Overview of StructGS: In the initialisation phase, our model employs a dynamic adjustment of spherical harmonics based on opacity weighting to optimise the first three RGB dimensions for each Gaussian sphere. Distance information further refines initialisation, with higher-order harmonics capturing more details for distant points and lower-order for nearer points. During the training phase, the rendered and ground truth images are divided into several patches. Within these patches, the SSIM loss ($L_{SSIM}$) for small areas is calculated using a kernel. The results are then summed and averaged. The rendered and ground truth images are also assessed for total variation loss ($L_{tv}$) and L1 loss ($L_{1}$). After training, our model incorporates a pre-trained Multi-scale Residual Network to render high-quality and high-resolution images.
  • Figure 3: Qualitative Comparison Results on the Mip-NeRF 360 Dataset barron2022mip. These models were trained on images with a resolution of 1.6k and we simulated the zoom-in situation. Unlike previous approaches, our model attains a higher degree of accuracy and detail than other models, rendering images that closely match the ground truth.
  • Figure 4: Qualitative Comparison Results across diverse datasets barron2022mipxiangli2022bungeenerfknapitsch2017tanks. The red boxes and arrows highlight artifacts rendered by state-of-the-art models yu2024miplu2024scaffold, which cause certain areas of the rendered images to appear blurry. The results above demonstrate that our model outperforms state-of-the-art models, achieving superior performance in rendering details with significantly fewer artifacts.
  • Figure 5: Ablation of Dynamic Spherical Harmonics Initialisation. We present an ablation study of the training progression of the bicycle scene barron2022mip. The result shows that this strategy improves the training quality in the early iterations.