Table of Contents
Fetching ...

RobustNeRF: Ignoring Distractors with Robust Losses

Sara Sabour, Suhani Vora, Daniel Duckworth, Ivan Krasin, David J. Fleet, Andrea Tagliasacchi

TL;DR

RobustNeRF tackles the core NeRF vulnerability to non-persistent distractors by treating distractors as outliers in the optimization objective. It adopts a trimmed least-squares framework within an iteratively reweighted scheme, enhanced with spatially coherent outlier masking that evolves during training, enabling the model to ignore transient content while learning static scene structure. The approach is simple to integrate into existing NeRF pipelines, requires minimal hyperparameter tuning, and yields strong quantitative gains over mip-NeRF 360 and competitive results against D$^2$NeRF on real and synthetic distractor-rich datasets. While it introduces some statistical inefficiency on clean data and longer training times, RobustNeRF demonstrates robust reconstruction in cluttered environments and lays groundwork for further improvements such as learned weighting or applying the loss to other NeRF variants.

Abstract

Neural radiance fields (NeRF) excel at synthesizing new views given multi-view, calibrated images of a static scene. When scenes include distractors, which are not persistent during image capture (moving objects, lighting variations, shadows), artifacts appear as view-dependent effects or 'floaters'. To cope with distractors, we advocate a form of robust estimation for NeRF training, modeling distractors in training data as outliers of an optimization problem. Our method successfully removes outliers from a scene and improves upon our baselines, on synthetic and real-world scenes. Our technique is simple to incorporate in modern NeRF frameworks, with few hyper-parameters. It does not assume a priori knowledge of the types of distractors, and is instead focused on the optimization problem rather than pre-processing or modeling transient objects. More results on our page https://robustnerf.github.io.

RobustNeRF: Ignoring Distractors with Robust Losses

TL;DR

RobustNeRF tackles the core NeRF vulnerability to non-persistent distractors by treating distractors as outliers in the optimization objective. It adopts a trimmed least-squares framework within an iteratively reweighted scheme, enhanced with spatially coherent outlier masking that evolves during training, enabling the model to ignore transient content while learning static scene structure. The approach is simple to integrate into existing NeRF pipelines, requires minimal hyperparameter tuning, and yields strong quantitative gains over mip-NeRF 360 and competitive results against DNeRF on real and synthetic distractor-rich datasets. While it introduces some statistical inefficiency on clean data and longer training times, RobustNeRF demonstrates robust reconstruction in cluttered environments and lays groundwork for further improvements such as learned weighting or applying the loss to other NeRF variants.

Abstract

Neural radiance fields (NeRF) excel at synthesizing new views given multi-view, calibrated images of a static scene. When scenes include distractors, which are not persistent during image capture (moving objects, lighting variations, shadows), artifacts appear as view-dependent effects or 'floaters'. To cope with distractors, we advocate a form of robust estimation for NeRF training, modeling distractors in training data as outliers of an optimization problem. Our method successfully removes outliers from a scene and improves upon our baselines, on synthetic and real-world scenes. Our technique is simple to incorporate in modern NeRF frameworks, with few hyper-parameters. It does not assume a priori knowledge of the types of distractors, and is instead focused on the optimization problem rather than pre-processing or modeling transient objects. More results on our page https://robustnerf.github.io.
Paper Structure (40 sections, 9 equations, 24 figures)

This paper contains 40 sections, 9 equations, 24 figures.

Figures (24)

  • Figure 1: NeRF assumes photometric consistency in the observed images of a scene. Violations of this assumption, as with the images in the top row, yield reconstructed scenes with inconsistent content in the form of "floaters" (highlighted with ellipses). We introduce a simple technique that produces clean reconstruction by automatically ignoring distractorswithout explicit supervision.
  • Figure 2: Ambiguity -- A simple 2D scene where a static object (blue) is captured by three cameras. During the first and third capture the scene is not photo-consistent as a distractor was within the field of view. Not photo-consistent portions of the scene can end up being encoded as view-dependent effects -- even when we assume ground truth geometry.
  • Figure 3: Histograms -- Robust estimators perform well when the distribution of residuals agrees with the one implied by the estimator (e.g., Gaussian for L2, Laplacian for L1). Here we visualize the ground-truth distribution of residuals (bottom-left), which is hardly a good match with any simple parametric distribution.
  • Figure 4: Kernels -- (top-left) Family of robust kernels robustloss, including L2 ($\alpha{=}{2}$), Charbonnier ($\alpha{=}{1}$) and Geman-McClure ($\alpha{=}{-2}$). (top-right) Mid-training, residual magnitudes are similar for distractors and fine-grained details, and pixels with large residuals are learned more slowly, as the gradient of re-descending kernels flattens out. (bottom-right) A too aggressive Geman-McClure in down-weighting large residuals removes both outliers and high-frequency detail. (bottom-left) A less aggressive Geman-McClure does not effectively remove outliers.
  • Figure 5: Algorithm -- We visualize our weight function computed by residuals on two examples: (top) the residuals of a (mid-training) NeRF rendered from a training viewpoint, (bottom) a toy residual image containing residual of small spatial extent (dot, line) and residuals of large spatial extent (squares). Notice residuals with large magnitude but small spatial extent (texture of the box, dot, line) are included in the optimization, while weaker residuals with larger spatial extent are excluded. Note that while we operate on patches, we visualize the weight function on the whole image to facilitate visualization.
  • ...and 19 more figures