Table of Contents
Fetching ...

Ray-Distance Volume Rendering for Neural Scene Reconstruction

Ruihong Yin, Yunlu Chen, Sezer Karaoglu, Theo Gevers

TL;DR

This work tackles indoor neural scene reconstruction by replacing SDF-derived density with ray-specific SRDF density, addressing noise from multi-object scenes. The method jointly predicts SDF and SRDF, renders with SRDF-derived density, and enforces sign consistency between SRDF and SDF via a differentiable loss, complemented by a self-supervised visibility task. The approach yields superior reconstruction and view synthesis on ScanNet, Replica, and Tanks and Temples, outperforming strong baselines such as MonoSDF and NeuRIS. Overall, the framework provides more plausible per-ray density distributions and geometry priors, enabling more accurate 3D surfaces and novel-view renderings in indoor environments.

Abstract

Existing methods in neural scene reconstruction utilize the Signed Distance Function (SDF) to model the density function. However, in indoor scenes, the density computed from the SDF for a sampled point may not consistently reflect its real importance in volume rendering, often due to the influence of neighboring objects. To tackle this issue, our work proposes a novel approach for indoor scene reconstruction, which instead parameterizes the density function with the Signed Ray Distance Function (SRDF). Firstly, the SRDF is predicted by the network and transformed to a ray-conditioned density function for volume rendering. We argue that the ray-specific SRDF only considers the surface along the camera ray, from which the derived density function is more consistent to the real occupancy than that from the SDF. Secondly, although SRDF and SDF represent different aspects of scene geometries, their values should share the same sign indicating the underlying spatial occupancy. Therefore, this work introduces a SRDF-SDF consistency loss to constrain the signs of the SRDF and SDF outputs. Thirdly, this work proposes a self-supervised visibility task, introducing the physical visibility geometry to the reconstruction task. The visibility task combines prior from predicted SRDF and SDF as pseudo labels, and contributes to generating more accurate 3D geometry. Our method implemented with different representations has been validated on indoor datasets, achieving improved performance in both reconstruction and view synthesis.

Ray-Distance Volume Rendering for Neural Scene Reconstruction

TL;DR

This work tackles indoor neural scene reconstruction by replacing SDF-derived density with ray-specific SRDF density, addressing noise from multi-object scenes. The method jointly predicts SDF and SRDF, renders with SRDF-derived density, and enforces sign consistency between SRDF and SDF via a differentiable loss, complemented by a self-supervised visibility task. The approach yields superior reconstruction and view synthesis on ScanNet, Replica, and Tanks and Temples, outperforming strong baselines such as MonoSDF and NeuRIS. Overall, the framework provides more plausible per-ray density distributions and geometry priors, enabling more accurate 3D surfaces and novel-view renderings in indoor environments.

Abstract

Existing methods in neural scene reconstruction utilize the Signed Distance Function (SDF) to model the density function. However, in indoor scenes, the density computed from the SDF for a sampled point may not consistently reflect its real importance in volume rendering, often due to the influence of neighboring objects. To tackle this issue, our work proposes a novel approach for indoor scene reconstruction, which instead parameterizes the density function with the Signed Ray Distance Function (SRDF). Firstly, the SRDF is predicted by the network and transformed to a ray-conditioned density function for volume rendering. We argue that the ray-specific SRDF only considers the surface along the camera ray, from which the derived density function is more consistent to the real occupancy than that from the SDF. Secondly, although SRDF and SDF represent different aspects of scene geometries, their values should share the same sign indicating the underlying spatial occupancy. Therefore, this work introduces a SRDF-SDF consistency loss to constrain the signs of the SRDF and SDF outputs. Thirdly, this work proposes a self-supervised visibility task, introducing the physical visibility geometry to the reconstruction task. The visibility task combines prior from predicted SRDF and SDF as pseudo labels, and contributes to generating more accurate 3D geometry. Our method implemented with different representations has been validated on indoor datasets, achieving improved performance in both reconstruction and view synthesis.
Paper Structure (24 sections, 13 equations, 13 figures, 10 tables)

This paper contains 24 sections, 13 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Analysis of the density function and weight distribution for a toy scene. (a) Frontal and overhead perspectives of a scene with multiple objects, where a ray originates from the camera center O, intersecting with the green rectangle at the point P. Q is the closest point to the blue cylinder along the ray $\overrightarrow{OP}$. (b) The SDF $d_\Omega$ /SRDF $\stackrel{\sim}{{d_{\Omega}}}$ in \ref{['eq:sdf', 'eq:srdf_sdf']} along the ray $\overrightarrow{OP}$. (c) The volume density $\sigma$ generated from the SDF/SRDF in (b) using \ref{['eq:density_sdf', 'eq:density']}. (d) The weight distribution $T_i\alpha_i$ generated from the density in (c) via \ref{['eq:rendering_color']}. Although Q is distant from the surface boundary intersecting with the ray, the density function from the SDF in Q generates a high weight in volume rendering, resulting in noisy rendering and reconstruction.
  • Figure 2: Our framework. A geometry MLP $f_g$ is adopted to generate the SDF and geometry features $\textbf{F}_g$, utilizing the encoded position (and optionally grid features) as input. Then, $\textbf{F}_g$, along with viewing direction, passes through a color MLP $f_c$ to predict the color for each point. Notably, (1) our approach advocates for modeling the density function with the ray-specific SRDF in addition to the SDF. For this purpose, a SRDF MLP $f_s$ is introduced to generate the SRDF. (2) A SRDF-SDF consistency loss $\mathcal{L}_{con}$ is devised to align the signs between the generated SRDF and SDF. (3) To enhance the geometry prediction, a self-supervised visibility task is proposed to integrate geometry priors in both SRDF and SDF predicted by the network and generates the pseudo visibility ground truth. The visibility probability is predicted by the SRDF MLP.
  • Figure 3: Qualitative comparisons on ScanNet. It can be seen that our method can reconstruct more surfaces, especially in thin regions.
  • Figure 4: Qualitative comparisons on Replica. Compared to MonoSDF and Occ_SDF_Hybrid, our method can generate more accurate surfaces.
  • Figure 5: Comprison of the yielded weight and image. The weight and image generated by our approach align more closely with the actual observations.
  • ...and 8 more figures