Table of Contents
Fetching ...

SplatSDF: Boosting Neural Implicit SDF via Gaussian Splatting Fusion

Runfa Blark Li, Keito Suzuki, Bang Du, Ki Myung Brian Lee, Nikolay Atanasov, Truong Nguyen

TL;DR

SplatSDF introduces architecture-level fusion of Gaussian Splatting (3DGS) with neural implicit SDF (SDF-NeRF) to recover continuous 3D geometry from multi-view images. By training with 3DGS input and employing a dedicated 3DGS Aggregator and a Surface 3DGS Fusion strategy, it achieves faster convergence and higher geometric and photometric accuracy than state-of-the-art SDF-NeRF methods, while retaining the same inference cost as prior approaches. Key findings include improved Chamfer Distance and PSNR on DTU and NeRF Synthetic datasets, and ablations showing the superiority of anchor-point surface fusion over dense fusion. The work suggests a practical path to more accurate and efficient neural implicit representations for robotics and graphics tasks, with explicit 3D priors guiding the SDF learning during training only.

Abstract

A signed distance function (SDF) is a useful representation for continuous-space geometry and many related operations, including rendering, collision checking, and mesh generation. Hence, reconstructing SDF from image observations accurately and efficiently is a fundamental problem. Recently, neural implicit SDF (SDF-NeRF) techniques, trained using volumetric rendering, have gained a lot of attention. Compared to earlier truncated SDF (TSDF) fusion algorithms that rely on depth maps and voxelize continuous space, SDF-NeRF enables continuous-space SDF reconstruction with better geometric and photometric accuracy. However, the accuracy and convergence speed of scene-level SDF reconstruction require further improvements for many applications. With the advent of 3D Gaussian Splatting (3DGS) as an explicit representation with excellent rendering quality and speed, several works have focused on improving SDF-NeRF by introducing consistency losses on depth and surface normals between 3DGS and SDF-NeRF. However, loss-level connections alone lead to incremental improvements. We propose a novel neural implicit SDF called "SplatSDF" to fuse 3DGSandSDF-NeRF at an architecture level with significant boosts to geometric and photometric accuracy and convergence speed. Our SplatSDF relies on 3DGS as input only during training, and keeps the same complexity and efficiency as the original SDF-NeRF during inference. Our method outperforms state-of-the-art SDF-NeRF models on geometric and photometric evaluation by the time of submission.

SplatSDF: Boosting Neural Implicit SDF via Gaussian Splatting Fusion

TL;DR

SplatSDF introduces architecture-level fusion of Gaussian Splatting (3DGS) with neural implicit SDF (SDF-NeRF) to recover continuous 3D geometry from multi-view images. By training with 3DGS input and employing a dedicated 3DGS Aggregator and a Surface 3DGS Fusion strategy, it achieves faster convergence and higher geometric and photometric accuracy than state-of-the-art SDF-NeRF methods, while retaining the same inference cost as prior approaches. Key findings include improved Chamfer Distance and PSNR on DTU and NeRF Synthetic datasets, and ablations showing the superiority of anchor-point surface fusion over dense fusion. The work suggests a practical path to more accurate and efficient neural implicit representations for robotics and graphics tasks, with explicit 3D priors guiding the SDF learning during training only.

Abstract

A signed distance function (SDF) is a useful representation for continuous-space geometry and many related operations, including rendering, collision checking, and mesh generation. Hence, reconstructing SDF from image observations accurately and efficiently is a fundamental problem. Recently, neural implicit SDF (SDF-NeRF) techniques, trained using volumetric rendering, have gained a lot of attention. Compared to earlier truncated SDF (TSDF) fusion algorithms that rely on depth maps and voxelize continuous space, SDF-NeRF enables continuous-space SDF reconstruction with better geometric and photometric accuracy. However, the accuracy and convergence speed of scene-level SDF reconstruction require further improvements for many applications. With the advent of 3D Gaussian Splatting (3DGS) as an explicit representation with excellent rendering quality and speed, several works have focused on improving SDF-NeRF by introducing consistency losses on depth and surface normals between 3DGS and SDF-NeRF. However, loss-level connections alone lead to incremental improvements. We propose a novel neural implicit SDF called "SplatSDF" to fuse 3DGSandSDF-NeRF at an architecture level with significant boosts to geometric and photometric accuracy and convergence speed. Our SplatSDF relies on 3DGS as input only during training, and keeps the same complexity and efficiency as the original SDF-NeRF during inference. Our method outperforms state-of-the-art SDF-NeRF models on geometric and photometric evaluation by the time of submission.

Paper Structure

This paper contains 22 sections, 11 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Our proposed SplatSDF boosts Neural Implicit SDF via Gaussian Splatting with novel architecture-level fusion strategies. SplatSDF makes it easier to converge to complex geometry (like the holes in the red boxes), achieves greater geometric and photometric accuracy, and $>$ 3 times faster convergence compared to the best baseline, Neuralangelo. ("CD" denotes Chamfer Distance).
  • Figure 2: Overview. Our SplatSDF takes posed RGB images and 3DGS to train an SDF-NeRF. We use 3DGS-rendered depths to identify the anchor point and shift the closest query point to the anchor point. With a shared hash encoder, we extract query-point SDF embeddings $e_{sdf}$ and 3DGS embeddings $e_{gs}$. Our method applies a 3DGS aggregator to merge the 3DGS attributes: mean $\mu$, covariance $\Sigma$, color $c$, and spherical harmonics $SH$. We propose a novel surface 3DGS fusion to fuse $e_{gs}$ and $e_{sdf}$ only around the anchor point and regress to SDF. With a density function, SDF is converted to per-point density $\sigma(x)$. We take the geometric features $g(x)$, the surface normal from SDF $n(x)$, the query-point coordinates $x$ and the viewing angle $v$ to estimate per-point color $c(x)$ and obtain the per-pixel color $\hat{C}$ by volumetric rendering to supervise with input images. Our core design is in the the 3DGS aggregator and the 3DGS fusion.
  • Figure 3: Dense 3D Fusion vs Surface 3D Fusion. Left: Dense 3DGS fusion on all valid query points (green points). Right: Surface 3DGS fusion only on the anchor point (black point). Fusing query points inside of surfaces using spurious GS (orange ellipsoids) far from the true surface leads to bumpy surface artifacts.
  • Figure 4: Surface Mesh Comparison on the NeRF Synthetic Dataset. Left to right: Ground Truth mesh (color is not available), SplatSDF, Neuralangelo, SuGAR. Row 1-2: Ficus. Row 3-4: Lego. Row 5-6: Ship. Zoom in to check details in red boxes. No red boxes for SuGAR since it is overall worse than SDF-NeRFs.
  • Figure 5: Tolerance to erroneous 3DGS initialized from noisy point cloud. First row: GS center in yellow overlap with the estimated surface mesh. Noisy GS centers is in red and shown in red boxes. Second row: No noise in the red box from GS-rendered depth.
  • ...and 5 more figures