Table of Contents
Fetching ...

PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar

Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram, Yuchen Fan, Christian Richardt, Ramesh Raskar, Rakesh Ranjan

TL;DR

This work tackles single-view 3D reconstruction by exploiting two-bounce light paths captured with a SPAD lidar to recover both visible and occluded geometry without data priors. PlatoNeRF models two-bounce light within a NeRF framework, supervising primary and secondary rays with lidar transients and optimizing a loss on depth via $t_{peak}=d/c$ with $d=d_1+d_2+d_3$. It demonstrates improved depth accuracy and robustness under ambient light and low-albedo backgrounds and generalizes to lower spatial/temporal resolutions, outperforming BF Lidar and RGB-shadow baselines on simulated and real data. The authors release synthetic data, code, and checkpoints to promote reproducibility and future work toward combining lidar and RGB with neural rendering for textured geometry.

Abstract

3D reconstruction from a single-view is challenging because of the ambiguity from monocular cues and lack of information about occluded regions. Neural radiance fields (NeRF), while popular for view synthesis and 3D reconstruction, are typically reliant on multi-view images. Existing methods for single-view 3D reconstruction with NeRF rely on either data priors to hallucinate views of occluded regions, which may not be physically accurate, or shadows observed by RGB cameras, which are difficult to detect in ambient light and low albedo backgrounds. We propose using time-of-flight data captured by a single-photon avalanche diode to overcome these limitations. Our method models two-bounce optical paths with NeRF, using lidar transient data for supervision. By leveraging the advantages of both NeRF and two-bounce light measured by lidar, we demonstrate that we can reconstruct visible and occluded geometry without data priors or reliance on controlled ambient lighting or scene albedo. In addition, we demonstrate improved generalization under practical constraints on sensor spatial- and temporal-resolution. We believe our method is a promising direction as single-photon lidars become ubiquitous on consumer devices, such as phones, tablets, and headsets.

PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar

TL;DR

This work tackles single-view 3D reconstruction by exploiting two-bounce light paths captured with a SPAD lidar to recover both visible and occluded geometry without data priors. PlatoNeRF models two-bounce light within a NeRF framework, supervising primary and secondary rays with lidar transients and optimizing a loss on depth via with . It demonstrates improved depth accuracy and robustness under ambient light and low-albedo backgrounds and generalizes to lower spatial/temporal resolutions, outperforming BF Lidar and RGB-shadow baselines on simulated and real data. The authors release synthetic data, code, and checkpoints to promote reproducibility and future work toward combining lidar and RGB with neural rendering for textured geometry.

Abstract

3D reconstruction from a single-view is challenging because of the ambiguity from monocular cues and lack of information about occluded regions. Neural radiance fields (NeRF), while popular for view synthesis and 3D reconstruction, are typically reliant on multi-view images. Existing methods for single-view 3D reconstruction with NeRF rely on either data priors to hallucinate views of occluded regions, which may not be physically accurate, or shadows observed by RGB cameras, which are difficult to detect in ambient light and low albedo backgrounds. We propose using time-of-flight data captured by a single-photon avalanche diode to overcome these limitations. Our method models two-bounce optical paths with NeRF, using lidar transient data for supervision. By leveraging the advantages of both NeRF and two-bounce light measured by lidar, we demonstrate that we can reconstruct visible and occluded geometry without data priors or reliance on controlled ambient lighting or scene albedo. In addition, we demonstrate improved generalization under practical constraints on sensor spatial- and temporal-resolution. We believe our method is a promising direction as single-photon lidars become ubiquitous on consumer devices, such as phones, tablets, and headsets.
Paper Structure (52 sections, 11 equations, 13 figures, 5 tables)

This paper contains 52 sections, 11 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: PlatoNeRF. We propose PlatoNeRF: a method to recover scene geometry from a single view using two-bounce signals captured by a single-photon lidar. (a) A laser illuminates a scene point, which diffusely reflects light in all directions. The reflected light illuminates the rest of the scene and casts shadows. Light that returns to the lidar sensor provides information about the visible scene, and cast shadows provide information about occluded portions of the scene. (b) The lidar sensor captures 3D time-of-flight images. (c) By aggregating several such images (by scanning the position of the laser), we are able to reconstruct the entire 3D scene geometry with volumetric rendering.
  • Figure 2: Problem Definition. We use a lidar system containing a SPAD at position $\mathbf{x}_s$ and a pulsed laser at position $\mathbf{x}_l$. The SPAD view is kept constant, while the laser sequentially illuminates different points in the scene, $\{\mathbf{l}_1, ..., \mathbf{l}_{K}\}$. For each illumination spot, we measure the time of flight for light to travel $\mathbf{x}_l \xrightarrow{d_1} \mathbf{l} \xrightarrow{d_2} \mathbf{x}_p \xrightarrow{d_3} \mathbf{x}_s$, shown by the captured transient.
  • Figure 3: Method. PlatoNeRF learns 3D scene geometry from single-view two-bounce lidar time of flight, modeled with NeRF. Our method consists of three steps. (a) First, we render primary rays from the camera to the scene (\ref{['sec:primary']}). (b) Second, we model rays that scatter and travel to the virtual light (the point where light rays first hit the scene) (\ref{['sec:secondary']}). Both steps are supervised with transients measured by a single-photon lidar. (c) Third, we find that reconstructing the two-bounce time of flight enables 3D reconstruction (\ref{['sec:reconstruction']}).
  • Figure 4: Qualitative Depth Results. We provide qualitative results for predicted depth on both train and novel test views, comparing our method, BF Lidar HenleHR2022, and $\text{S}^3$--NeRF YangCCCW2022a to the ground truth across four scenes. Each method is trained from the one train view shown and reconstructs the entire scene.
  • Figure 5: Real-World Results.(a) Captured scene (stars are illumination spots), (b) BF Lidar result, (c) PlatoNeRF result. Our method yields similar results as BF Lidar, with much fewer artifacts/holes.
  • ...and 8 more figures