Table of Contents
Fetching ...

Transient Neural Radiance Fields for Lidar View Synthesis and 3D Reconstruction

Anagh Malik, Parsa Mirdehghan, Sotiris Nousias, Kiriakos N. Kutulakos, David B. Lindell

TL;DR

We address lidar view synthesis and 3D reconstruction from time-resolved single-photon measurements by introducing Transient Neural Radiance Fields that integrate a time-resolved image-formation model into NeRFs. The method uses a time-resolved volume rendering equation and a neural representation to render photon-count transients from novel viewpoints, supported by an HDR-inspired loss and space carving regularization. Experiments on a first-of-its-kind simulated and hardware-captured transient multiview lidar dataset show improved geometry and appearance with as few as 2–5 input views, enabling realistic rendering of transient lidar data and high-fidelity depth estimation. This work extends NeRFs to transient imaging, offering a new modality for lidar simulation and downstream tasks in autonomous driving, robotics, and remote sensing.

Abstract

Neural radiance fields (NeRFs) have become a ubiquitous tool for modeling scene appearance and geometry from multiview imagery. Recent work has also begun to explore how to use additional supervision from lidar or depth sensor measurements in the NeRF framework. However, previous lidar-supervised NeRFs focus on rendering conventional camera imagery and use lidar-derived point cloud data as auxiliary supervision; thus, they fail to incorporate the underlying image formation model of the lidar. Here, we propose a novel method for rendering transient NeRFs that take as input the raw, time-resolved photon count histograms measured by a single-photon lidar system, and we seek to render such histograms from novel views. Different from conventional NeRFs, the approach relies on a time-resolved version of the volume rendering equation to render the lidar measurements and capture transient light transport phenomena at picosecond timescales. We evaluate our method on a first-of-its-kind dataset of simulated and captured transient multiview scans from a prototype single-photon lidar. Overall, our work brings NeRFs to a new dimension of imaging at transient timescales, newly enabling rendering of transient imagery from novel views. Additionally, we show that our approach recovers improved geometry and conventional appearance compared to point cloud-based supervision when training on few input viewpoints. Transient NeRFs may be especially useful for applications which seek to simulate raw lidar measurements for downstream tasks in autonomous driving, robotics, and remote sensing.

Transient Neural Radiance Fields for Lidar View Synthesis and 3D Reconstruction

TL;DR

We address lidar view synthesis and 3D reconstruction from time-resolved single-photon measurements by introducing Transient Neural Radiance Fields that integrate a time-resolved image-formation model into NeRFs. The method uses a time-resolved volume rendering equation and a neural representation to render photon-count transients from novel viewpoints, supported by an HDR-inspired loss and space carving regularization. Experiments on a first-of-its-kind simulated and hardware-captured transient multiview lidar dataset show improved geometry and appearance with as few as 2–5 input views, enabling realistic rendering of transient lidar data and high-fidelity depth estimation. This work extends NeRFs to transient imaging, offering a new modality for lidar simulation and downstream tasks in autonomous driving, robotics, and remote sensing.

Abstract

Neural radiance fields (NeRFs) have become a ubiquitous tool for modeling scene appearance and geometry from multiview imagery. Recent work has also begun to explore how to use additional supervision from lidar or depth sensor measurements in the NeRF framework. However, previous lidar-supervised NeRFs focus on rendering conventional camera imagery and use lidar-derived point cloud data as auxiliary supervision; thus, they fail to incorporate the underlying image formation model of the lidar. Here, we propose a novel method for rendering transient NeRFs that take as input the raw, time-resolved photon count histograms measured by a single-photon lidar system, and we seek to render such histograms from novel views. Different from conventional NeRFs, the approach relies on a time-resolved version of the volume rendering equation to render the lidar measurements and capture transient light transport phenomena at picosecond timescales. We evaluate our method on a first-of-its-kind dataset of simulated and captured transient multiview scans from a prototype single-photon lidar. Overall, our work brings NeRFs to a new dimension of imaging at transient timescales, newly enabling rendering of transient imagery from novel views. Additionally, we show that our approach recovers improved geometry and conventional appearance compared to point cloud-based supervision when training on few input viewpoints. Transient NeRFs may be especially useful for applications which seek to simulate raw lidar measurements for downstream tasks in autonomous driving, robotics, and remote sensing.
Paper Structure (24 sections, 7 equations, 7 figures, 2 tables)

This paper contains 24 sections, 7 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Overview of transient neural radiance fields (Transient NeRFs). Measurements from a single-photon lidar are captured using a single-photon avalanche diode (SPAD), pulsed laser, scanning mirrors, and a time-correlated single photon counter (TCSPC). The lidar scans, consisting of a 2D array of photon count histograms (visualized with maximum-intensity projection), are captured from multiple viewpoints and used to optimize the transient NeRF. After training, we render novel views of time-resolved lidar measurements ($x$--$y$ and $x$--$t$ slices are indicated by the dotted red lines), and we also convert the rendered data into intensity and depth maps.
  • Figure 2: Rendering transient neural radiance fields. We cast rays through a volume and retrieve the density and color at each point using a neural representation muller2022instant. A time-resolved measurement is constructed using volume rendering (Equation \ref{['eq:nerf']}), and we bin the radiance contributions into an array based on distance along the ray. The result is convolved with the impulse response of the lidar (which incorporates the shape of the laser pulse), and we supervise the neural representation based on the difference between the rendered and captured transient measurements.
  • Figure 3: Hardware prototype. A pulsed laser shares a path with a single-pixel SPAD, and the illumination and imaging path are controlled by scanning mirrors.
  • Figure 4: Results on simulated data. We show images from depth-supervised NeRF baselines as well as color images and rendered transients from our method after training on 2, 3, and 5 viewpoints. The proposed method produces cleaner results and generates 3D transients for each viewpoint.
  • Figure 5: Comparison of depth maps recovered from simulated measurements trained on 5 views of the lego scene.
  • ...and 2 more figures