Ray-Distance Volume Rendering for Neural Scene Reconstruction
Ruihong Yin, Yunlu Chen, Sezer Karaoglu, Theo Gevers
TL;DR
This work tackles indoor neural scene reconstruction by replacing SDF-derived density with ray-specific SRDF density, addressing noise from multi-object scenes. The method jointly predicts SDF and SRDF, renders with SRDF-derived density, and enforces sign consistency between SRDF and SDF via a differentiable loss, complemented by a self-supervised visibility task. The approach yields superior reconstruction and view synthesis on ScanNet, Replica, and Tanks and Temples, outperforming strong baselines such as MonoSDF and NeuRIS. Overall, the framework provides more plausible per-ray density distributions and geometry priors, enabling more accurate 3D surfaces and novel-view renderings in indoor environments.
Abstract
Existing methods in neural scene reconstruction utilize the Signed Distance Function (SDF) to model the density function. However, in indoor scenes, the density computed from the SDF for a sampled point may not consistently reflect its real importance in volume rendering, often due to the influence of neighboring objects. To tackle this issue, our work proposes a novel approach for indoor scene reconstruction, which instead parameterizes the density function with the Signed Ray Distance Function (SRDF). Firstly, the SRDF is predicted by the network and transformed to a ray-conditioned density function for volume rendering. We argue that the ray-specific SRDF only considers the surface along the camera ray, from which the derived density function is more consistent to the real occupancy than that from the SDF. Secondly, although SRDF and SDF represent different aspects of scene geometries, their values should share the same sign indicating the underlying spatial occupancy. Therefore, this work introduces a SRDF-SDF consistency loss to constrain the signs of the SRDF and SDF outputs. Thirdly, this work proposes a self-supervised visibility task, introducing the physical visibility geometry to the reconstruction task. The visibility task combines prior from predicted SRDF and SDF as pseudo labels, and contributes to generating more accurate 3D geometry. Our method implemented with different representations has been validated on indoor datasets, achieving improved performance in both reconstruction and view synthesis.
