Table of Contents
Fetching ...

Transientangelo: Few-Viewpoint Surface Reconstruction Using Single-Photon Lidar

Weihan Luo, Anagh Malik, David B. Lindell

TL;DR

Transientangelo introduces a few-viewpoint surface reconstruction framework that leverages raw time-resolved transients from a single-photon lidar to optimize a hash-grid SDF surface representation. By rendering time-resolved transients through a neural surface and applying targeted regularizers, it achieves high-fidelity geometry with as few as 10 photons per pixel and 2–5 viewpoints. The approach surpasses depth and mesh baselines in simulated and captured data, and demonstrates robustness to photon-starved conditions, with a new multiview transient dataset supporting evaluation. This work advances practical 3D reconstruction for low-light, high-speed, or long-range lidar scenarios by exploiting transient information and cross-view constraints.

Abstract

We consider the problem of few-viewpoint 3D surface reconstruction using raw measurements from a lidar system. Lidar captures 3D scene geometry by emitting pulses of light to a target and recording the speed-of-light time delay of the reflected light. However, conventional lidar systems do not output the raw, captured waveforms of backscattered light; instead, they pre-process these data into a 3D point cloud. Since this procedure typically does not accurately model the noise statistics of the system, exploit spatial priors, or incorporate information about downstream tasks, it ultimately discards useful information that is encoded in raw measurements of backscattered light. Here, we propose to leverage raw measurements captured with a single-photon lidar system from multiple viewpoints to optimize a neural surface representation of a scene. The measurements consist of time-resolved photon count histograms, or transients, which capture information about backscattered light at picosecond time scales. Additionally, we develop new regularization strategies that improve robustness to photon noise, enabling accurate surface reconstruction with as few as 10 photons per pixel. Our method outperforms other techniques for few-viewpoint 3D reconstruction based on depth maps, point clouds, or conventional lidar as demonstrated in simulation and with captured data.

Transientangelo: Few-Viewpoint Surface Reconstruction Using Single-Photon Lidar

TL;DR

Transientangelo introduces a few-viewpoint surface reconstruction framework that leverages raw time-resolved transients from a single-photon lidar to optimize a hash-grid SDF surface representation. By rendering time-resolved transients through a neural surface and applying targeted regularizers, it achieves high-fidelity geometry with as few as 10 photons per pixel and 2–5 viewpoints. The approach surpasses depth and mesh baselines in simulated and captured data, and demonstrates robustness to photon-starved conditions, with a new multiview transient dataset supporting evaluation. This work advances practical 3D reconstruction for low-light, high-speed, or long-range lidar scenarios by exploiting transient information and cross-view constraints.

Abstract

We consider the problem of few-viewpoint 3D surface reconstruction using raw measurements from a lidar system. Lidar captures 3D scene geometry by emitting pulses of light to a target and recording the speed-of-light time delay of the reflected light. However, conventional lidar systems do not output the raw, captured waveforms of backscattered light; instead, they pre-process these data into a 3D point cloud. Since this procedure typically does not accurately model the noise statistics of the system, exploit spatial priors, or incorporate information about downstream tasks, it ultimately discards useful information that is encoded in raw measurements of backscattered light. Here, we propose to leverage raw measurements captured with a single-photon lidar system from multiple viewpoints to optimize a neural surface representation of a scene. The measurements consist of time-resolved photon count histograms, or transients, which capture information about backscattered light at picosecond time scales. Additionally, we develop new regularization strategies that improve robustness to photon noise, enabling accurate surface reconstruction with as few as 10 photons per pixel. Our method outperforms other techniques for few-viewpoint 3D reconstruction based on depth maps, point clouds, or conventional lidar as demonstrated in simulation and with captured data.
Paper Structure (45 sections, 17 equations, 19 figures, 14 tables)

This paper contains 45 sections, 17 equations, 19 figures, 14 tables.

Figures (19)

  • Figure 1: Transientangelo takes as input raw lidar scans from sparse viewpoints. These scans are used to optimize a scene representation based on a signed distance function, which is further regularized to constrain the geometry from both captured and unseen viewpoints. The method recovers higher-fidelity surfaces than previous methods in the sparse-view and low-photon regime (i.e., from tens to hundreds of measured photons per pixel). The above scene was trained using simulated single-photon lidar data from five viewpoints, with an average of 150 photons per pixel over the occupied regions of the scene.
  • Figure 2: Transientangelo training procedure. For a pixel $\mathbf{p}$, we cast out a ray $\mathbf{r}(t) = \mathbf{o}+tc\bm{\omega}(\mathbf{p})$. Captured viewpoint: 3D ray coordinates are passed through the neural surface representation $\mathcal{F}$ to retrieve a radiance and an SDF value, which gets converted to density $\sigma$ (see Equation \ref{['eq:density']}). These values are then binned into a transient, which, after a convolution with the laser pulse, gives the final rendered transient $\boldsymbol{\tau}_f[i, j, k]$. The network is supervised with an L1 loss between the rendered and captured transient and an L1 loss between the integrals of the rendered and captured transients (reflectivity loss). Unseen viewpoint: 3D ray coordinates are passed through the neural representation to retrieve rendering weights, which are used to calculate the variance of the weights around the depth $d$. The network is trained to minimize this variance, resulting in thinner surfaces and removing spurious zero-level sets.
  • Figure 3: Results on the simulated dataset using an average of 6000 photons per occupied (non-background) pixel. We show the recovered meshes from the baselines and the proposed method. For the transient-based methods, we also show rendered transients for the indicated pixels. Our method recovers smoother meshes with fewer missing parts. We also recover transients that better match the ground truth.
  • Figure 4: Results on the captured dataset. We show the recovered meshes from the baselines and the proposed method. Due to the lack of ground-truth, we include the closest captured image for reference of the scene. As can be seen, Neuralangelo recovers smoother meshes with fewer missing parts. We also recover transients that better match the ground truth.
  • Figure 5: Novel view synthesis for varying photon levels. We show the rendered novel view on the cinema scene, trained on five viewpoints with an average of $50, 300$ photons. We also show peak-time visualizations malik2024flying, which show the full transient in a single visualization. Hue encodes the time of peak intensity, brightness is modulated by the maximum intensity, and each band corresponds to an isochrone, or wavefront of equal path length. We show transient plots (right) for the pixel indicated on the ground-truth image (blue dot).
  • ...and 14 more figures