Table of Contents
Fetching ...

See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization

Zongqi He, Zhe Xiao, Kin-Chung Chan, Yushen Zuo, Jun Xiao, Kin-Man Lam

TL;DR

This work proposes a semantic regularization technique, using features extracted from the pretrained DINO-ViT model, to ensure multi-view semantic consistency and proposes local depth regularization, which constrains depth values to improve generalization on unseen views.

Abstract

3D Gaussian Splatting (3DGS) has shown remarkable performance in novel view synthesis. However, its rendering quality deteriorates with sparse inphut views, leading to distorted content and reduced details. This limitation hinders its practical application. To address this issue, we propose a sparse-view 3DGS method. Given the inherently ill-posed nature of sparse-view rendering, incorporating prior information is crucial. We propose a semantic regularization technique, using features extracted from the pretrained DINO-ViT model, to ensure multi-view semantic consistency. Additionally, we propose local depth regularization, which constrains depth values to improve generalization on unseen views. Our method outperforms state-of-the-art novel view synthesis approaches, achieving up to 0.4dB improvement in terms of PSNR on the LLFF dataset, with reduced distortion and enhanced visual quality.

See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization

TL;DR

This work proposes a semantic regularization technique, using features extracted from the pretrained DINO-ViT model, to ensure multi-view semantic consistency and proposes local depth regularization, which constrains depth values to improve generalization on unseen views.

Abstract

3D Gaussian Splatting (3DGS) has shown remarkable performance in novel view synthesis. However, its rendering quality deteriorates with sparse inphut views, leading to distorted content and reduced details. This limitation hinders its practical application. To address this issue, we propose a sparse-view 3DGS method. Given the inherently ill-posed nature of sparse-view rendering, incorporating prior information is crucial. We propose a semantic regularization technique, using features extracted from the pretrained DINO-ViT model, to ensure multi-view semantic consistency. Additionally, we propose local depth regularization, which constrains depth values to improve generalization on unseen views. Our method outperforms state-of-the-art novel view synthesis approaches, achieving up to 0.4dB improvement in terms of PSNR on the LLFF dataset, with reduced distortion and enhanced visual quality.
Paper Structure (12 sections, 7 equations, 5 figures, 2 tables)

This paper contains 12 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Visual results of FreeNeRF Yang2023FreeNeRF, RegNeRF niemeyer2022regnerf, FSGS zhu2023FSGS, and our SIDGaussian.
  • Figure 2: The overall pipeline of our proposed SIDGaussian. A sparse point cloud for the 3D Gaussian initialization is generated from sparse views using SfM. In addition to the $\mathcal{L}_{0}$ loss, the proposed local depth regularization $\mathcal{L}_\text{depth}$, and semantic regularization $\mathcal{L}_\text{sem}$ are applied during training.
  • Figure 3: Visual results of the scenes "Leaves" and "Horns" generated by FreeNeRF Yang2023FreeNeRF, RegNeRF niemeyer2022regnerf, FSGS zhu2023FSGS, and our SIDGaussian.
  • Figure 4: Visual results of our method with/without semantic regularization and local depth regularization.
  • Figure 5: The influence of the weights $w_{\text{sem}}$ and $w_{\text{depth}}$ in terms of PSNR score.