Table of Contents
Fetching ...

NeRFs are Mirror Detectors: Using Structural Similarity for Multi-View Mirror Scene Reconstruction with 3D Surface Primitives

Leif Van Holland, Michael Weinmann, Jan U. Müller, Patrick Stotko, Reinhard Klein

TL;DR

The paper tackles the challenge of modeling mirroring surfaces in neural radiance fields by proposing NeRF-MD, which detects mirrors from photometric inconsistencies during standard NeRF training and learns explicit 3D primitives for mirror geometry. It introduces a three-stage pipeline: a baseline NeRF with a depth reprojection loss, a structural similarity–based scoring to localize potential mirrors, and a joint optimization that blends primary and reflected rays via differentiable masks. The main contributions are automatic mirror localization without masking annotations, a differentiable framework for reflecting geometry, and demonstrated improvements on synthetic and real data, particularly in mirror regions, compared to baselines. This work advances practical mirror-aware NeRFs, enabling faithful multi-view reconstruction in scenes with reflective surfaces without manual priors.

Abstract

While neural radiance fields (NeRF) led to a breakthrough in photorealistic novel view synthesis, handling mirroring surfaces still denotes a particular challenge as they introduce severe inconsistencies in the scene representation. Previous attempts either focus on reconstructing single reflective objects or rely on strong supervision guidance in terms of additional user-provided annotations of visible image regions of the mirrors, thereby limiting the practical usability. In contrast, in this paper, we present NeRF-MD, a method which shows that NeRFs can be considered as mirror detectors and which is capable of reconstructing neural radiance fields of scenes containing mirroring surfaces without the need for prior annotations. To this end, we first compute an initial estimate of the scene geometry by training a standard NeRF using a depth reprojection loss. Our key insight lies in the fact that parts of the scene corresponding to a mirroring surface will still exhibit a significant photometric inconsistency, whereas the remaining parts are already reconstructed in a plausible manner. This allows us to detect mirror surfaces by fitting geometric primitives to such inconsistent regions in this initial stage of the training. Using this information, we then jointly optimize the radiance field and mirror geometry in a second training stage to refine their quality. We demonstrate the capability of our method to allow the faithful detection of mirrors in the scene as well as the reconstruction of a single consistent scene representation, and demonstrate its potential in comparison to baseline and mirror-aware approaches.

NeRFs are Mirror Detectors: Using Structural Similarity for Multi-View Mirror Scene Reconstruction with 3D Surface Primitives

TL;DR

The paper tackles the challenge of modeling mirroring surfaces in neural radiance fields by proposing NeRF-MD, which detects mirrors from photometric inconsistencies during standard NeRF training and learns explicit 3D primitives for mirror geometry. It introduces a three-stage pipeline: a baseline NeRF with a depth reprojection loss, a structural similarity–based scoring to localize potential mirrors, and a joint optimization that blends primary and reflected rays via differentiable masks. The main contributions are automatic mirror localization without masking annotations, a differentiable framework for reflecting geometry, and demonstrated improvements on synthetic and real data, particularly in mirror regions, compared to baselines. This work advances practical mirror-aware NeRFs, enabling faithful multi-view reconstruction in scenes with reflective surfaces without manual priors.

Abstract

While neural radiance fields (NeRF) led to a breakthrough in photorealistic novel view synthesis, handling mirroring surfaces still denotes a particular challenge as they introduce severe inconsistencies in the scene representation. Previous attempts either focus on reconstructing single reflective objects or rely on strong supervision guidance in terms of additional user-provided annotations of visible image regions of the mirrors, thereby limiting the practical usability. In contrast, in this paper, we present NeRF-MD, a method which shows that NeRFs can be considered as mirror detectors and which is capable of reconstructing neural radiance fields of scenes containing mirroring surfaces without the need for prior annotations. To this end, we first compute an initial estimate of the scene geometry by training a standard NeRF using a depth reprojection loss. Our key insight lies in the fact that parts of the scene corresponding to a mirroring surface will still exhibit a significant photometric inconsistency, whereas the remaining parts are already reconstructed in a plausible manner. This allows us to detect mirror surfaces by fitting geometric primitives to such inconsistent regions in this initial stage of the training. Using this information, we then jointly optimize the radiance field and mirror geometry in a second training stage to refine their quality. We demonstrate the capability of our method to allow the faithful detection of mirrors in the scene as well as the reconstruction of a single consistent scene representation, and demonstrate its potential in comparison to baseline and mirror-aware approaches.
Paper Structure (14 sections, 8 equations, 7 figures, 1 table)

This paper contains 14 sections, 8 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Overview of our pipeline. (a) A collection of training images is first used to train a standard NeRF model with an additional depth reprojection loss (\ref{['eq:pc_loss']}). (b) Per-pixel scores are then computed using SSIM and depth variance values. Highly scoring pixels are unprojected into 3D, and the resulting point cloud is segmented into primitive shapes. (c) Finally, a modified rendering pipeline is employed to jointly optimize NeRF and mirror parameters by blending primary and reflected images together based on antialiased mirror masks that are generated in a differentiable manner.
  • Figure 2: Examples for the intermediate values used to generate the final score $s(r)$.
  • Figure 3: The proposed $p$-norm schedule (solid blue line) compared to the usual $L_2$ optimization (dotted green line). The images show a zoom in to a mirror region after $\tau_\text{inc}$ iterations.
  • Figure 4: Results on the test set of two synthetic scenes compared to the best baseline method according to \ref{['tab:results']}.
  • Figure 5: Results on the test set of two real world scenes compared to the best baseline method according to \ref{['tab:results']}.
  • ...and 2 more figures