Table of Contents
Fetching ...

Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting

Raja Kumar, Vanshika Vats

TL;DR

This work proposes a depth-aware Gaussian splatting method that uses monocular depth prediction as a prior, along with a scale-invariant depth loss, to constrain the 3D shape under just a few input views to avoid overfitting.

Abstract

3D Gaussian splatting has surpassed neural radiance field methods in novel view synthesis by achieving lower computational costs and real-time high-quality rendering. Although it produces a high-quality rendering with a lot of input views, its performance drops significantly when only a few views are available. In this work, we address this by proposing a depth-aware Gaussian splatting method for few-shot novel view synthesis. We use monocular depth prediction as a prior, along with a scale-invariant depth loss, to constrain the 3D shape under just a few input views. We also model color using lower-order spherical harmonics to avoid overfitting. Further, we observe that removing splats with lower opacity periodically, as performed in the original work, leads to a very sparse point cloud and, hence, a lower-quality rendering. To mitigate this, we retain all the splats, leading to a better reconstruction in a few view settings. Experimental results show that our method outperforms the traditional 3D Gaussian splatting methods by achieving improvements of 10.5% in peak signal-to-noise ratio, 6% in structural similarity index, and 14.1% in perceptual similarity, thereby validating the effectiveness of our approach. The code will be made available at: https://github.com/raja-kumar/depth-aware-3DGS

Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting

TL;DR

This work proposes a depth-aware Gaussian splatting method that uses monocular depth prediction as a prior, along with a scale-invariant depth loss, to constrain the 3D shape under just a few input views to avoid overfitting.

Abstract

3D Gaussian splatting has surpassed neural radiance field methods in novel view synthesis by achieving lower computational costs and real-time high-quality rendering. Although it produces a high-quality rendering with a lot of input views, its performance drops significantly when only a few views are available. In this work, we address this by proposing a depth-aware Gaussian splatting method for few-shot novel view synthesis. We use monocular depth prediction as a prior, along with a scale-invariant depth loss, to constrain the 3D shape under just a few input views. We also model color using lower-order spherical harmonics to avoid overfitting. Further, we observe that removing splats with lower opacity periodically, as performed in the original work, leads to a very sparse point cloud and, hence, a lower-quality rendering. To mitigate this, we retain all the splats, leading to a better reconstruction in a few view settings. Experimental results show that our method outperforms the traditional 3D Gaussian splatting methods by achieving improvements of 10.5% in peak signal-to-noise ratio, 6% in structural similarity index, and 14.1% in perceptual similarity, thereby validating the effectiveness of our approach. The code will be made available at: https://github.com/raja-kumar/depth-aware-3DGS

Paper Structure

This paper contains 16 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: An overview of our proposed method. We start with just 5 training inputs from different viewpoints, collect Structure-from-Motion (SfM) points, and initialize the 3D Gaussians from them. We then use our modified differentiable tile rasterizer to render depth. In addition to photometric loss, we also compute scale-invariant depth loss and use it as supervision.
  • Figure 2: Structure from Motion (SfM) point clouds extracted using COLMAP schoenberger2016sfm from (a) original 20 views, and (b) 5 views. Notice how sparse the SfM points are for 5 views with which we initiate our training.
  • Figure 3: We compare our results with the original 3DGS and observe better rendering quality using our method. The red box shows zoomed-in output for better reference.
  • Figure 4: Visualization of point clouds generated from our method (right column) as compared to the original 3DGS (left column). Notice that our method forms cleaner and sharper point clouds than the original 3DGS. The red boxes show zoomed-in outputs for better reference and clarity.