Table of Contents
Fetching ...

Shoot-Bounce-3D: Single-Shot Occlusion-Aware 3D from Lidar by Decomposing Two-Bounce Light

Tzofi Klinghoffer, Siddharth Somasundaram, Xiaoyu Xiang, Yuchen Fan, Christian Richardt, Akshat Dave, Ramesh Raskar, Rakesh Ranjan

TL;DR

SB3D introduces a data-driven demultiplexing framework for two-bounce light in multiplexed single-shot SP lidar, enabling occlusion-aware 3D reconstruction including mirrors. It creates a large-scale simulated dataset of 100k transients and trains models to separate per-spot ToF and shadows, followed by neural rendering with PlatoNeRF for 3D geometry. The method demonstrates depth, occluded geometry, and specular segmentation from a single capture, with real-world validation on multiplexed 16-spot data. This work advances single-shot 3D sensing and opens avenues for SPAD-based foundation models and RGB-Lidar fusion.

Abstract

3D scene reconstruction from a single measurement is challenging, especially in the presence of occluded regions and specular materials, such as mirrors. We address these challenges by leveraging single-photon lidars. These lidars estimate depth from light that is emitted into the scene and reflected directly back to the sensor. However, they can also measure light that bounces multiple times in the scene before reaching the sensor. This multi-bounce light contains additional information that can be used to recover dense depth, occluded geometry, and material properties. Prior work with single-photon lidar, however, has only demonstrated these use cases when a laser sequentially illuminates one scene point at a time. We instead focus on the more practical - and challenging - scenario of illuminating multiple scene points simultaneously. The complexity of light transport due to the combined effects of multiplexed illumination, two-bounce light, shadows, and specular reflections is challenging to invert analytically. Instead, we propose a data-driven method to invert light transport in single-photon lidar. To enable this approach, we create the first large-scale simulated dataset of ~100k lidar transients for indoor scenes. We use this dataset to learn a prior on complex light transport, enabling measured two-bounce light to be decomposed into the constituent contributions from each laser spot. Finally, we experimentally demonstrate how this decomposed light can be used to infer 3D geometry in scenes with occlusions and mirrors from a single measurement. Our code and dataset are released at https://shoot-bounce-3d.github.io.

Shoot-Bounce-3D: Single-Shot Occlusion-Aware 3D from Lidar by Decomposing Two-Bounce Light

TL;DR

SB3D introduces a data-driven demultiplexing framework for two-bounce light in multiplexed single-shot SP lidar, enabling occlusion-aware 3D reconstruction including mirrors. It creates a large-scale simulated dataset of 100k transients and trains models to separate per-spot ToF and shadows, followed by neural rendering with PlatoNeRF for 3D geometry. The method demonstrates depth, occluded geometry, and specular segmentation from a single capture, with real-world validation on multiplexed 16-spot data. This work advances single-shot 3D sensing and opens avenues for SPAD-based foundation models and RGB-Lidar fusion.

Abstract

3D scene reconstruction from a single measurement is challenging, especially in the presence of occluded regions and specular materials, such as mirrors. We address these challenges by leveraging single-photon lidars. These lidars estimate depth from light that is emitted into the scene and reflected directly back to the sensor. However, they can also measure light that bounces multiple times in the scene before reaching the sensor. This multi-bounce light contains additional information that can be used to recover dense depth, occluded geometry, and material properties. Prior work with single-photon lidar, however, has only demonstrated these use cases when a laser sequentially illuminates one scene point at a time. We instead focus on the more practical - and challenging - scenario of illuminating multiple scene points simultaneously. The complexity of light transport due to the combined effects of multiplexed illumination, two-bounce light, shadows, and specular reflections is challenging to invert analytically. Instead, we propose a data-driven method to invert light transport in single-photon lidar. To enable this approach, we create the first large-scale simulated dataset of ~100k lidar transients for indoor scenes. We use this dataset to learn a prior on complex light transport, enabling measured two-bounce light to be decomposed into the constituent contributions from each laser spot. Finally, we experimentally demonstrate how this decomposed light can be used to infer 3D geometry in scenes with occlusions and mirrors from a single measurement. Our code and dataset are released at https://shoot-bounce-3d.github.io.

Paper Structure

This paper contains 59 sections, 5 equations, 18 figures, 1 table.

Figures (18)

  • Figure 1: Multi-Bounce Signals. Shoot-Bounce-3D leverages multi-bounce signals measured from single-photon lidar. Multi-bounce light encodes (a) dense depth (from geometric constraints), (b) occluded geometry (from shadows), and (c) specular surfaces (from two- and three-bounce pairs), but existing techniques assume a single scene point is illuminated at a time, scanning a laser over the scene. However, multi-bounce lidars on consumer devices instead use (d)multiplexed illumination, meaning multiple points are illuminated at once -- causing existing methods to fail due to (1) lack of correspondence between two-bounce peaks and illumination points, and (2) mixing of signals from (a), (b), and (c). To resolve these ambiguities, we employ a learning-based technique.
  • Figure 2: Method overview. Shoot-Bounce-3D (SB3D) performs 3D reconstruction from a single lidar measurement. The pipeline consists of three steps, each with its own output. (a) First, from a measurement taken with multiplexed illumination (meaning multiple points in the scene are illuminated at once), SB3D is trained to learn to predict depth -- allowing the 2-bounce time-of-flight (ToF) for each illumination point to be separated using ray geometries. Because our scenes contain specular objects, we find the ToF encoder used for this step also learns features that enable specular object segmentation. (b) The predicted 2-bounce ToF is unprojected into histograms and used with the lidar measurement to estimate shadows. Using the 2-bounce ToF allows the network to learn shadow transients, improving performance. (c) Finally, usin g the predicted 2-bounce ToF and shadows, PlatoNeRF can be trained for 3D reconstruction.
  • Figure 3: Shadow Transients. We leverage the idea of shadow transients to improve network training for shadow demultiplexing. The key idea is to estimate the light that never reached the sensor due to the object casting a shadow. In the top row, there's no object, so the shadow transient is empty. In the bottom row, two of the three light paths are blocked, so only one peak shows up in the measurement. The shadow transient, on the other hand, measures the two light sources that were blocked by the occluded object. In practice, the calibrated capture is estimated from the data, not measured. As a result, the calibrated capture is input to the network with the measured histogram to prevent errors due to inaccurate shadow transient estimation.
  • Figure 4: Proposed Dataset. Samples from our simulated dataset of multi-bounce transients for 100k scenes. Our dataset also contains RGB, depth, normals, and segmentation maps for each scene. Transients are simulated with varying amounts of multiplexed illumination -- shown as the intensity maps. Binary shadow maps are provided for each illumination point. The last row shows frames of the simulated lidar transient for an example scene.
  • Figure 5: Demultiplexing Results. We show qualitative results for demultiplexing both time of flight (ToF) and shadows. Each column denotes a different illumination source -- our method extracts the two-bounce ToF and shadow maps for each from the multiplexed lidar measurement. The last row shows frames from the predicted "light in flight" video (video provided on our https://shoot-bounce-3d.github.io/), which is rendered by combining the predicted two-bounce ToF and shadows into a transient measurement, allowing visualization of two-bounce light propagation per illumination point.
  • ...and 13 more figures