Table of Contents
Fetching ...

Sensing Surface Patches in Volume Rendering for Inferring Signed Distance Functions

Sijia Jiang, Tong Wu, Jing Hua, Zhizhong Han

TL;DR

The paper addresses the challenge of recovering detailed 3D geometry from multi-view images by learning a signed distance function (SDF) through volume rendering while explicitly sensing and constraining surfaces. It introduces a surface-patch sensing mechanism that uses predicted SDF values and gradients to pull nearby queries onto the zero level set, enabling explicit surface constraints such as depth and photometric consistency. The approach combines volume-rendering losses with patch-based surface losses (depth consistency, NCC photometric consistency, and plane-fitting) and demonstrates state-of-the-art performance on indoor benchmarks like Replica and ScanNet. This surface-aware SDF inference improves surface fidelity and suppresses artifacts in empty space, offering a practical advance for neural implicit reconstructions in real-world scenes.

Abstract

It is vital to recover 3D geometry from multi-view RGB images in many 3D computer vision tasks. The latest methods infer the geometry represented as a signed distance field by minimizing the rendering error on the field through volume rendering. However, it is still challenging to explicitly impose constraints on surfaces for inferring more geometry details due to the limited ability of sensing surfaces in volume rendering. To resolve this problem, we introduce a method to infer signed distance functions (SDFs) with a better sense of surfaces through volume rendering. Using the gradients and signed distances, we establish a small surface patch centered at the estimated intersection along a ray by pulling points randomly sampled nearby. Hence, we are able to explicitly impose surface constraints on the sensed surface patch, such as multi-view photo consistency and supervision from depth or normal priors, through volume rendering. We evaluate our method by numerical and visual comparisons on scene benchmarks. Our superiority over the latest methods justifies our effectiveness.

Sensing Surface Patches in Volume Rendering for Inferring Signed Distance Functions

TL;DR

The paper addresses the challenge of recovering detailed 3D geometry from multi-view images by learning a signed distance function (SDF) through volume rendering while explicitly sensing and constraining surfaces. It introduces a surface-patch sensing mechanism that uses predicted SDF values and gradients to pull nearby queries onto the zero level set, enabling explicit surface constraints such as depth and photometric consistency. The approach combines volume-rendering losses with patch-based surface losses (depth consistency, NCC photometric consistency, and plane-fitting) and demonstrates state-of-the-art performance on indoor benchmarks like Replica and ScanNet. This surface-aware SDF inference improves surface fidelity and suppresses artifacts in empty space, offering a practical advance for neural implicit reconstructions in real-world scenes.

Abstract

It is vital to recover 3D geometry from multi-view RGB images in many 3D computer vision tasks. The latest methods infer the geometry represented as a signed distance field by minimizing the rendering error on the field through volume rendering. However, it is still challenging to explicitly impose constraints on surfaces for inferring more geometry details due to the limited ability of sensing surfaces in volume rendering. To resolve this problem, we introduce a method to infer signed distance functions (SDFs) with a better sense of surfaces through volume rendering. Using the gradients and signed distances, we establish a small surface patch centered at the estimated intersection along a ray by pulling points randomly sampled nearby. Hence, we are able to explicitly impose surface constraints on the sensed surface patch, such as multi-view photo consistency and supervision from depth or normal priors, through volume rendering. We evaluate our method by numerical and visual comparisons on scene benchmarks. Our superiority over the latest methods justifies our effectiveness.

Paper Structure

This paper contains 11 sections, 14 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Overview of our method. We infer SDF $f_{\theta}$ from multi-view images including RGB images, depth and normal maps that were either captured by sensors or estimated by monocular networks. Using the predicted signed distances and gradients $\nabla f_{\theta}$, we are enabled to sense a surface patch $s$ by pulling randomly sampled queries $q$ onto the zero level set as shown in Fig. \ref{['fig:surface']} (d). With $s$, we can infer $f_{\theta}$ using both supervision through volume rendering and constraints that can be explicitly imposed on the sensed surface $s$.
  • Figure 2: Patch Difference. Current methods mainly impose constraints on single points on zero level set in (a), rather than a patch, since it is very hard to obtain a 3D surface patch during volume rendering. Different from obtaining a patch from depth in (b), our method projects randomly sampled points on the zero level set to obtain the surface patch which is more continuous and representative in (c).
  • Figure 3: Error map comparison on ScanNet (bigger error: red, smaller error: blue) highlights our superiority.
  • Figure 4: (a) Visual comparison on ScanNet released by NeuralRGBD. We use error maps (bigger error: red, smaller error: blue) to highlight our superiority over NeuralRGBD. (b) Visual comparison on Replica. We use error maps (bigger error: red, smaller error: blue) to highlight our superiority over MonoSDF.
  • Figure 5: Compactness comparison with MonoSDF (GT mesh:Gray).
  • ...and 3 more figures