Table of Contents
Fetching ...

Towards 3D Vision with Low-Cost Single-Photon Cameras

Fangzhou Mu, Carter Sifferman, Sacha Jungerman, Yiquan Li, Mark Han, Michael Gleicher, Mohit Gupta, Yin Li

TL;DR

This work introduces a low-cost, time-resolved 3D vision system based on distributed SPAD proximity sensors to reconstruct complex Lambertian objects. It combines a differentiable transient formation model with a neural implicit surface (SDF) representation, rendering transients via volume rendering and optimizing against observed histograms in an analysis-by-synthesis framework. Across simulations and real hardware, the method substantially outperforms reprojection and space carving baselines, achieving Chamfer distances on the order of a few millimeters and demonstrating robustness to ambient light and texture. The approach paves the way for practical, energy-efficient 3D sensing with commodity hardware in robotics, wearables, and mobile platforms, while outlining future directions for handling non-Lambertian reflectance and real-time operation.

Abstract

We present a method for reconstructing 3D shape of arbitrary Lambertian objects based on measurements by miniature, energy-efficient, low-cost single-photon cameras. These cameras, operating as time resolved image sensors, illuminate the scene with a very fast pulse of diffuse light and record the shape of that pulse as it returns back from the scene at a high temporal resolution. We propose to model this image formation process, account for its non-idealities, and adapt neural rendering to reconstruct 3D geometry from a set of spatially distributed sensors with known poses. We show that our approach can successfully recover complex 3D shapes from simulated data. We further demonstrate 3D object reconstruction from real-world captures, utilizing measurements from a commodity proximity sensor. Our work draws a connection between image-based modeling and active range scanning and is a step towards 3D vision with single-photon cameras.

Towards 3D Vision with Low-Cost Single-Photon Cameras

TL;DR

This work introduces a low-cost, time-resolved 3D vision system based on distributed SPAD proximity sensors to reconstruct complex Lambertian objects. It combines a differentiable transient formation model with a neural implicit surface (SDF) representation, rendering transients via volume rendering and optimizing against observed histograms in an analysis-by-synthesis framework. Across simulations and real hardware, the method substantially outperforms reprojection and space carving baselines, achieving Chamfer distances on the order of a few millimeters and demonstrating robustness to ambient light and texture. The approach paves the way for practical, energy-efficient 3D sensing with commodity hardware in robotics, wearables, and mobile platforms, while outlining future directions for handling non-Lambertian reflectance and real-time operation.

Abstract

We present a method for reconstructing 3D shape of arbitrary Lambertian objects based on measurements by miniature, energy-efficient, low-cost single-photon cameras. These cameras, operating as time resolved image sensors, illuminate the scene with a very fast pulse of diffuse light and record the shape of that pulse as it returns back from the scene at a high temporal resolution. We propose to model this image formation process, account for its non-idealities, and adapt neural rendering to reconstruct 3D geometry from a set of spatially distributed sensors with known poses. We show that our approach can successfully recover complex 3D shapes from simulated data. We further demonstrate 3D object reconstruction from real-world captures, utilizing measurements from a commodity proximity sensor. Our work draws a connection between image-based modeling and active range scanning and is a step towards 3D vision with single-photon cameras.
Paper Structure (15 sections, 16 equations, 12 figures, 2 tables)

This paper contains 15 sections, 16 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: We demonstrate that measurements from spatially distributed low-cost single-photon proximity sensors (left) can be used to reconstruct 3D shape of real world objects (right). Our method combines a differentiable image formation model and neural rendering to recover 3D geometry based on measurements (transient histograms) from sensors with known poses. This is done by minimizing the difference between the observed and rendered sensor measurements. For clarity, a subset of sensor poses and measurements are shown.
  • Figure 2: Method Overview: The scene is modeled as a neural implicit surface in the form of an SDF. To render a transient, we approximate Eq. \ref{['eq:angular']} by sampling rays within each pixel's FoV, and subsequently points on those rays. This idealized transient waveform is then convolved with the sensor's laser impulse response to model the transient histogram formation. Finally, we optimize the scene representation by minimizing a loss between the rendered transients and the observations.
  • Figure 3: Qualitative results on simulated data. Our method reconstructs dense and detailed 3D shapes. Space carving provides only hulls of a target shape, and is prone to carving away extra space when thin structures are present. Reprojection yields sparse points.
  • Figure 4: To capture real-world data from a wide set of viewpoints, we mount the TMF8820 proximity sensor to a robot arm. Forward kinematics of the robot are used to gather sensor pose.
  • Figure 5: Qualitative results on real-world captures. Our method again attains the highest reconstruction quality. Poses in column two are subsampled by a factor of two for clarity. See supplement for additional qualitative results.
  • ...and 7 more figures