Table of Contents
Fetching ...

NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation

Casimir Feldmann, Niall Siegenheim, Nikolas Hars, Lovro Rabuzin, Mert Ertugrul, Luca Wolfart, Marc Pollefeys, Zuria Bauer, Martin R. Oswald

TL;DR

This work proposes a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets and demonstrates the benefits of this approach to model performance and robustness.

Abstract

The capabilities of monocular depth estimation (MDE) models are limited by the availability of sufficient and diverse datasets. In the case of MDE models for autonomous driving, this issue is exacerbated by the linearity of the captured data trajectories. We propose a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets and demonstrate the benefits of our approach to model performance and robustness. Our data augmentation pipeline, which we call \textit{NeRFmentation}, trains NeRFs on each scene in a dataset, filters out subpar NeRFs based on relevant metrics, and uses them to generate synthetic RGB-D images captured from new viewing directions. In this work, we apply our technique in conjunction with three state-of-the-art MDE architectures on the popular autonomous driving dataset, KITTI, augmenting its training set of the Eigen split. We evaluate the resulting performance gain on the original test set, a separate popular driving dataset, and our own synthetic test set.

NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation

TL;DR

This work proposes a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets and demonstrates the benefits of this approach to model performance and robustness.

Abstract

The capabilities of monocular depth estimation (MDE) models are limited by the availability of sufficient and diverse datasets. In the case of MDE models for autonomous driving, this issue is exacerbated by the linearity of the captured data trajectories. We propose a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets and demonstrate the benefits of our approach to model performance and robustness. Our data augmentation pipeline, which we call \textit{NeRFmentation}, trains NeRFs on each scene in a dataset, filters out subpar NeRFs based on relevant metrics, and uses them to generate synthetic RGB-D images captured from new viewing directions. In this work, we apply our technique in conjunction with three state-of-the-art MDE architectures on the popular autonomous driving dataset, KITTI, augmenting its training set of the Eigen split. We evaluate the resulting performance gain on the original test set, a separate popular driving dataset, and our own synthetic test set.
Paper Structure (29 sections, 3 equations, 12 figures, 11 tables)

This paper contains 29 sections, 3 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 2: Our proposed pipeline: (1) Train NeRFs for each scene in MDE dataset, reserving images for quality evaluation. (2) Filter out subpar NeRFs. (3) Render novel views by perturbing original poses. (4) Combine novel and original views to create NeRFmented training dataset for MDE network. Source dataset: KITTI geiger_kitti_2013
  • Figure 3: Qualitative comparison of depth-nerfacto vs. depth-nerfacto-huge reconstruction on KITTI geiger_kitti_2013. The figure shows reconstructions of training images using both depth-nerfacto and depth-nerfacto-huge. On average, depth-nerfacto-huge outputs exhibit higher levels of sharpness and better accuracy in both the generated RGB and depth images compared to depth-nerfacto.
  • Figure 4: Qualitative NeRF reconstruction results on KITTI geiger_kitti_2013. Original KITTI images are compared with those generated by trained and filtered NeRFs for matching camera poses. The reconstructed RGB images closely resemble the originals, while NeRFs also complete sparse ground truth depth maps.
  • Figure 5: Qualitative results on the Waymo sun_scalability_2020 dataset, focusing on close-up details. We show the qualitative close-ups of the performance of the vanilla-trained AdaBins bhat_adabins_2021 vs our proposed NeRFmented AdaBins, demonstrating the capability of our method to recover fine grain details in the prediction that the baseline is not able to predict. Color scale: 0 (purple) to 80 meters (yellow).
  • Figure 6: Qualitative results on the KITTI geiger_kitti_2013 dataset. We show the qualitative depth predictions of the vanilla-trained DepthFormer li2023depthformer vs our proposed NeRFmented DepthFormer, demonstrating the capability of our method to recover fine-grain details in the prediction that the baseline is not able to predict. Color scale: 0 (purple) to 80 meters (yellow).
  • ...and 7 more figures