RANRAC: Robust Neural Scene Representations via Random Ray Consensus

Benno Buschmann; Andreea Dogaru; Elmar Eisemann; Michael Weinmann; Bernhard Egger

RANRAC: Robust Neural Scene Representations via Random Ray Consensus

Benno Buschmann, Andreea Dogaru, Elmar Eisemann, Michael Weinmann, Bernhard Egger

TL;DR

RANRAC tackles data inconsistencies that plague neural scene representations by replacing robust losses with a RANSAC-like random ray consensus framework. It extends RANSAC to high-dimensional, data-driven models by sampling hypotheses, inferring latent codes, rendering predictions, and validating consensus without relying on clean data guarantees, for both light field networks and neural radiance fields. The method yields substantial improvements over baselines and prior robust approaches on synthetic and real datasets, including scenarios with occlusions, pose errors, and blur, and supports single-shot as well as multi-view reconstruction. With parallel hypothesis processing and a practical runtime on standard GPUs, RANRAC offers a versatile path toward artifact-free, robust neural scene reconstructions in real-world conditions.

Abstract

Learning-based scene representations such as neural radiance fields or light field networks, that rely on fitting a scene model to image observations, commonly encounter challenges in the presence of inconsistencies within the images caused by occlusions, inaccurately estimated camera parameters or effects like lens flare. To address this challenge, we introduce RANdom RAy Consensus (RANRAC), an efficient approach to eliminate the effect of inconsistent data, thereby taking inspiration from classical RANSAC based outlier detection for model fitting. In contrast to the down-weighting of the effect of outliers based on robust loss formulations, our approach reliably detects and excludes inconsistent perspectives, resulting in clean images without floating artifacts. For this purpose, we formulate a fuzzy adaption of the RANSAC paradigm, enabling its application to large scale models. We interpret the minimal number of samples to determine the model parameters as a tunable hyperparameter, investigate the generation of hypotheses with data-driven models, and analyze the validation of hypotheses in noisy environments. We demonstrate the compatibility and potential of our solution for both photo-realistic robust multi-view reconstruction from real-world images based on neural radiance fields and for single-shot reconstruction based on light-field networks. In particular, the results indicate significant improvements compared to state-of-the-art robust methods for novel-view synthesis on both synthetic and captured scenes with various inconsistencies including occlusions, noisy camera pose estimates, and unfocused perspectives. The results further indicate significant improvements for single-shot reconstruction from occluded images. Project Page: https://bennobuschmann.com/ranrac/

RANRAC: Robust Neural Scene Representations via Random Ray Consensus

TL;DR

Abstract

Paper Structure (14 sections, 2 equations, 5 figures, 2 tables)

This paper contains 14 sections, 2 equations, 5 figures, 2 tables.

Introduction
Related Work
Method
RANSAC Convergence on Complex Models
Random Sampling Neural Fields
Robust Light Field Networks
Robust Neural Radiance Fields
Hyperparameters
Implementation & Preprocessing
Experiments
Inconsistencies, Baselines & Datasets
Evaluation
Limitations & Future Work
Conclusion

Figures (5)

Figure 1: We propose a robust algorithm for 3D reconstruction from occluded input perspectives that is based on the random sampling of hypotheses. Our algorithm is general and we demonstrate the use for single-shot reconstruction using light field networks or multi-view reconstruction using NeRF. In these cases, it successfully removes the artifacts that normally occur due to occluded input perspectives.
Figure 2: The RANRAC algorithm for LFNs samples random hypotheses by selecting a set of random samples from the given perspective (a), and inferring the latent representation of these rays using the autodecoder of a pretrained LFN (b). The obtained light field is then used to predict an image from the input perspective (c). Based on this prediction, confidence in the random hypothesis is evaluated via the Euclidean distance between the predicted ray colors and the remaining color samples of the input image. The amount of samples which are explained by each hypothesis up to some margin are used to determine the best hypothesis (d). All samples explained by the selected hypothesis are used for a final inference with the LFN to obtain the final model and latent representation (e).
Figure 3: RANRAC (solid lines) leads to a quantitative improvement in PSNR and SSIM (higher is better) for occluded inputs compared to vanilla LFNs (dashed lines). The same hyperparameter configuration and LFN is used for all classes. On the left and in the middle, the amount of image occlusion is increased, while the object occlusion is constant at 25%. On the right, the amount of object occlusion is increased while the image occlusion is kept low. For the car class, a large improvement is observed over the entire occlusion spectrum. For the plane class the improvement is similarly significant, but absolute performance degenerates a bit sooner. This stems from the smaller object size and the related faster occlusion-to-object increase when increasing image occlusions. For the chair class, the improvement is less significant but the structural similarity is preserved for much longer. For the plane and car class the reconstruction quality is resilient to information loss (right) up to $\sim$50%, where the decrease gains momentum. With the low amounts of image occlusion, the improvement is not significant for the chair class (consistent with left and middle).
Figure 4: On the left, we show the qualitative effect of increasing occlusion on the same observation for the reconstruction of a novel view. Reconstructions of LFNs break early globally whereas RANRAC still provides a very decent reconstruction, only slowly introducing minor local (and natural/comprehensible) artifacts for completely hidden object parts. We further show the obtained consensus set, used for the final reconstruction (green inliers, red outliers). On the right, we show more qualitative results for novel view synthesis on different classes and the corresponding consensus sets.
Figure 5: The occlusions lead to well-visible artifacts in the reconstructions of NeRF, these artifacts are completely removed by RANRAC. While RobustNeRF struggles with view-dependent and high-frequency details, RANRAC reliably reconstructs them.

RANRAC: Robust Neural Scene Representations via Random Ray Consensus

TL;DR

Abstract

RANRAC: Robust Neural Scene Representations via Random Ray Consensus

Authors

TL;DR

Abstract

Table of Contents

Figures (5)