Table of Contents
Fetching ...

Improving Neural Radiance Fields with Depth-aware Optimization for Novel View Synthesis

Shu Chen, Junyao Li, Yang Zhang, Beiji Zou

TL;DR

SfMNeRF addresses geometric ambiguity in NeRF under sparse inputs by jointly optimizing a neural radiance field with depth-aware constraints derived from self-supervised depth prediction. It combines explicit losses—epipolar constraint $L_{epi}$, positions-of-matched-features loss $L_{3D}$, patch-based photometric loss $L_{pc}$ (including $L_{pr}$ and $L_{SSIM}$), and depth-smooth loss $L_{ds}$—in a total objective $L_{total}=\lambda_{ren}L_{ren}+\lambda_{3D}L_{3D}+\lambda_{pr}L_{pr}+\lambda_{epi}L_{epi}+\lambda_{SSIM}L_{SSIM}+\lambda_{ds}L_{ds}$, along with sub-pixel rendering and a pyramidal training scheme. Across LLFF-NeRF, ScanNet, and DTU, SfMNeRF yields improved PSNR/SSIM and more accurate depth maps for sparse-view synthesis, demonstrating effective integration of explicit geometry priors with implicit radiance fields. The work highlights how depth and photometric consistency constraints can regularize NeRF without extra data, though challenges remain with repetitive indoor structures and generalization to large-scale scenes.

Abstract

With dense inputs, Neural Radiance Fields (NeRF) is able to render photo-realistic novel views under static conditions. Although the synthesis quality is excellent, existing NeRF-based methods fail to obtain moderate three-dimensional (3D) structures. The novel view synthesis quality drops dramatically given sparse input due to the implicitly reconstructed inaccurate 3D-scene structure. We propose SfMNeRF, a method to better synthesize novel views as well as reconstruct the 3D-scene geometry. SfMNeRF leverages the knowledge from the self-supervised depth estimation methods to constrain the 3D-scene geometry during view synthesis training. Specifically, SfMNeRF employs the epipolar, photometric consistency, depth smoothness, and position-of-matches constraints to explicitly reconstruct the 3D-scene structure. Through these explicit constraints and the implicit constraint from NeRF, our method improves the view synthesis as well as the 3D-scene geometry performance of NeRF at the same time. In addition, SfMNeRF synthesizes novel sub-pixels in which the ground truth is obtained by image interpolation. This strategy enables SfMNeRF to include more samples to improve generalization performance. Experiments on two public datasets demonstrate that SfMNeRF surpasses state-of-the-art approaches. Code is available at https://github.com/XTU-PR-LAB/SfMNeRF

Improving Neural Radiance Fields with Depth-aware Optimization for Novel View Synthesis

TL;DR

SfMNeRF addresses geometric ambiguity in NeRF under sparse inputs by jointly optimizing a neural radiance field with depth-aware constraints derived from self-supervised depth prediction. It combines explicit losses—epipolar constraint , positions-of-matched-features loss , patch-based photometric loss (including and ), and depth-smooth loss —in a total objective , along with sub-pixel rendering and a pyramidal training scheme. Across LLFF-NeRF, ScanNet, and DTU, SfMNeRF yields improved PSNR/SSIM and more accurate depth maps for sparse-view synthesis, demonstrating effective integration of explicit geometry priors with implicit radiance fields. The work highlights how depth and photometric consistency constraints can regularize NeRF without extra data, though challenges remain with repetitive indoor structures and generalization to large-scale scenes.

Abstract

With dense inputs, Neural Radiance Fields (NeRF) is able to render photo-realistic novel views under static conditions. Although the synthesis quality is excellent, existing NeRF-based methods fail to obtain moderate three-dimensional (3D) structures. The novel view synthesis quality drops dramatically given sparse input due to the implicitly reconstructed inaccurate 3D-scene structure. We propose SfMNeRF, a method to better synthesize novel views as well as reconstruct the 3D-scene geometry. SfMNeRF leverages the knowledge from the self-supervised depth estimation methods to constrain the 3D-scene geometry during view synthesis training. Specifically, SfMNeRF employs the epipolar, photometric consistency, depth smoothness, and position-of-matches constraints to explicitly reconstruct the 3D-scene structure. Through these explicit constraints and the implicit constraint from NeRF, our method improves the view synthesis as well as the 3D-scene geometry performance of NeRF at the same time. In addition, SfMNeRF synthesizes novel sub-pixels in which the ground truth is obtained by image interpolation. This strategy enables SfMNeRF to include more samples to improve generalization performance. Experiments on two public datasets demonstrate that SfMNeRF surpasses state-of-the-art approaches. Code is available at https://github.com/XTU-PR-LAB/SfMNeRF
Paper Structure (19 sections, 16 equations, 7 figures, 4 tables)

This paper contains 19 sections, 16 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of SfMNeRF. This picture depicts how the photometric consistency loss and epipolar loss are implemented.
  • Figure 2: Overview of SfMNeRF. This picture depicts how the positions of matched features constraint is implemented. $S$ and $S'$ are the matched features, respectively.
  • Figure 3: Implausible epipolar points elimination. (a) The reference image with a white point. (b) The epipolar line is depicted by the white line in another image and the white point is the corresponding point. (c) The obtained epipolar points after filtering which represented by the white points in the image.
  • Figure 4: Qualitative comparison between our SfMNeRF and other approaches on the LLFF-NeRF dataset.
  • Figure 5: Qualitative comparison between our SfMNeRF and other approaches on the ScanNet dataset.
  • ...and 2 more figures