Table of Contents
Fetching ...

Data Augmentation for NeRFs in the Low Data Limit

Ayush Gaggar, Todd D. Murphey

TL;DR

This work tackles the challenge of training Neural Radiance Fields (NeRFs) with very limited and partially observed data in robotic settings. It introduces a data augmentation framework that samples new training views from a posterior uncertainty distribution combining in-distribution entropy and out-of-distribution spatial coverage, implemented via rejection sampling on a hemispherical candidate set. The uncertainty distribution is defined as $U(r(s)) = H(r(s))_{ent} + D(r(s))_{dist}$, with $H$ capturing ID uncertainty and $D$ measuring SE(3) pose-distance to training views; sampling is performed by accepting candidate rays with probability proportional to this uncertainty. Empirical results on Blender scenes show the method outperforms state-of-the-art baselines in PSNR, LPIPS, and SSIM while exhibiting markedly lower variability, demonstrating strong data-efficiency and robustness for NeRFs in resource-constrained, partially observed environments. The approach is end-to-end and transformable to existing NeRF architectures without requiring pre-training, making it practical for real-world robotic applications where informative data is expensive or scarce.

Abstract

Current methods based on Neural Radiance Fields fail in the low data limit, particularly when training on incomplete scene data. Prior works augment training data only in next-best-view applications, which lead to hallucinations and model collapse with sparse data. In contrast, we propose adding a set of views during training by rejection sampling from a posterior uncertainty distribution, generated by combining a volumetric uncertainty estimator with spatial coverage. We validate our results on partially observed scenes; on average, our method performs 39.9% better with 87.5% less variability across established scene reconstruction benchmarks, as compared to state of the art baselines. We further demonstrate that augmenting the training set by sampling from any distribution leads to better, more consistent scene reconstruction in sparse environments. This work is foundational for robotic tasks where augmenting a dataset with informative data is critical in resource-constrained, a priori unknown environments. Videos and source code are available at https://murpheylab.github.io/low-data-nerf/.

Data Augmentation for NeRFs in the Low Data Limit

TL;DR

This work tackles the challenge of training Neural Radiance Fields (NeRFs) with very limited and partially observed data in robotic settings. It introduces a data augmentation framework that samples new training views from a posterior uncertainty distribution combining in-distribution entropy and out-of-distribution spatial coverage, implemented via rejection sampling on a hemispherical candidate set. The uncertainty distribution is defined as , with capturing ID uncertainty and measuring SE(3) pose-distance to training views; sampling is performed by accepting candidate rays with probability proportional to this uncertainty. Empirical results on Blender scenes show the method outperforms state-of-the-art baselines in PSNR, LPIPS, and SSIM while exhibiting markedly lower variability, demonstrating strong data-efficiency and robustness for NeRFs in resource-constrained, partially observed environments. The approach is end-to-end and transformable to existing NeRF architectures without requiring pre-training, making it practical for real-world robotic applications where informative data is expensive or scarce.

Abstract

Current methods based on Neural Radiance Fields fail in the low data limit, particularly when training on incomplete scene data. Prior works augment training data only in next-best-view applications, which lead to hallucinations and model collapse with sparse data. In contrast, we propose adding a set of views during training by rejection sampling from a posterior uncertainty distribution, generated by combining a volumetric uncertainty estimator with spatial coverage. We validate our results on partially observed scenes; on average, our method performs 39.9% better with 87.5% less variability across established scene reconstruction benchmarks, as compared to state of the art baselines. We further demonstrate that augmenting the training set by sampling from any distribution leads to better, more consistent scene reconstruction in sparse environments. This work is foundational for robotic tasks where augmenting a dataset with informative data is critical in resource-constrained, a priori unknown environments. Videos and source code are available at https://murpheylab.github.io/low-data-nerf/.

Paper Structure

This paper contains 19 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Uncertainty distributions (left) and scene reconstruction (right) for NeRFs with sparse training views. Per method, Left: Uncertainty distribution generated over the hemisphere bounding the object, with brighter colors corresponding to higher uncertainty. Right: Novel view reconstruction after training with sparse images (6 initial + 6 augmented). Comparisons with probabilistic methods---ours, Spatial Entropy lee2022uncertainty, and FisherRF jiang2023fisherrf---are shown. Ground truth shows that the initial views are all taken from the left half of the hemisphere, such that the object is only partially observed and the right half is highly uncertain. After augmenting the training set, our method does the best job in both accounting for unseen regions in the uncertainty distribution field, and in scene reconstruction quality. On the other hand, Spatial Entropy has high uncertainty even in observed regions (indicative of hallucinations), and FisherRF has low uncertainty across most the hemisphere, even in unobserved regions (indicative of overfitting).
  • Figure 2: Common ways NeRFs fail in the low data limit. (a) Training views can be added along the hemisphere, outlined in gray; here, the Lego bulldozer object is shown. (b) Failure by hallucination, where rays are unable to learn depth properly and fail to create a single model; data augmented by the FisherRF method. (c) Failure by overfitting, where the model confidently predicts nothing in the scene; data augmented by the Spatial Entropy method. (d) Failure by occluded artifacts, where the model is unable to render with clarity; data augmented by adding views furthest from each other, i.e., maximally apart.
  • Figure 3: Scene reconstruction after 10k training iterations for three different objects and data augmentation methods. Across all scenes, only our method renders the model without visual artifacts. The scene is initially partially observed, with six training views all taken from the same half of the hemisphere; based on the data selection method, six additional views are added to the training set after 200 training iterations.
  • Figure 4: Evaluation results of standard image quality metrics across our method and three other SOTA baselines. Each metric score was evaluated across the 200 images in the evaluation dataset for each of the three scenes. A higher score is better for PSNR and SSIM, and a lower score is better for LPIPS. We achieve the best median performance and the lowest interquartile range compared to any method across each scene, except for material SSIM vs. Entropy. Our method performs better with a statistical significance of p < 0.05 and a Bonferroni correction of 3, except for lego LPIPS vs. Uniform, chair LPIPS vs FisherRF, and chair SSIM vs Uniform and FisherRF.