Table of Contents
Fetching ...

A Plug-and-Play Algorithm for 3D Video Super-Resolution of Single-Photon LiDAR data

Alice Ruget, Lewis Wilson, Jonathan Leach, Rachael Tobin, Aongus Mccarthy, Gerald S. Buller, Steve Mclaughlin, Abderrahim Halimi

TL;DR

The paper addresses robust 3D video reconstruction from high-frame-rate SPAD histograms captured between lower-rate passive-camera frames. It introduces a plug-and-play optimization that alternates between flow-based alignment, guided histogram denoising, and HVSR-driven super-resolution, formalized via half-quadratic splitting. The method demonstrates improved depth accuracy and edge sharpness over naive averaging and HVSR baselines on both simulated and real SPAD datasets, including indoor fast-motion, low-resolution consumer sensors, and outdoor long-range scenes. The approach enables blur-free, high-resolution 3D video from multimodal SPAD/passive-camera data, with potential impact on autonomous navigation, remote sensing, and mobile imaging.

Abstract

Single-photon avalanche diodes (SPADs) are advanced sensors capable of detecting individual photons and recording their arrival times with picosecond resolution using time-correlated Single-Photon Counting detection techniques. They are used in various applications, such as LiDAR, and can capture high-speed sequences of binary single-photon images, offering great potential for reconstructing 3D environments with high motion dynamics. To complement single-photon data, they are often paired with conventional passive cameras, which capture high-resolution (HR) intensity images at a lower frame rate. However, 3D reconstruction from SPAD data faces challenges. Aggregating multiple binary measurements improves precision and reduces noise but can cause motion blur in dynamic scenes. Additionally, SPAD arrays often have lower resolution than passive cameras. To address these issues, we propose a novel computational imaging algorithm to improve the 3D reconstruction of moving scenes from SPAD data by addressing the motion blur and increasing the native spatial resolution. We adopt a plug-and-play approach within an optimization scheme alternating between guided video super-resolution of the 3D scene, and precise image realignment using optical flow. Experiments on synthetic data show significantly improved image resolutions across various signal-to-noise ratios and photon levels. We validate our method using real-world SPAD measurements on three practical situations with dynamic objects. First on fast-moving scenes in laboratory conditions at short range; second very low resolution imaging of people with a consumer-grade SPAD sensor from STMicroelectronics; and finally, HR imaging of people walking outdoors in daylight at a range of 325 meters under eye-safe illumination conditions using a short-wave infrared SPAD camera. These results demonstrate the robustness and versatility of our approach.

A Plug-and-Play Algorithm for 3D Video Super-Resolution of Single-Photon LiDAR data

TL;DR

The paper addresses robust 3D video reconstruction from high-frame-rate SPAD histograms captured between lower-rate passive-camera frames. It introduces a plug-and-play optimization that alternates between flow-based alignment, guided histogram denoising, and HVSR-driven super-resolution, formalized via half-quadratic splitting. The method demonstrates improved depth accuracy and edge sharpness over naive averaging and HVSR baselines on both simulated and real SPAD datasets, including indoor fast-motion, low-resolution consumer sensors, and outdoor long-range scenes. The approach enables blur-free, high-resolution 3D video from multimodal SPAD/passive-camera data, with potential impact on autonomous navigation, remote sensing, and mobile imaging.

Abstract

Single-photon avalanche diodes (SPADs) are advanced sensors capable of detecting individual photons and recording their arrival times with picosecond resolution using time-correlated Single-Photon Counting detection techniques. They are used in various applications, such as LiDAR, and can capture high-speed sequences of binary single-photon images, offering great potential for reconstructing 3D environments with high motion dynamics. To complement single-photon data, they are often paired with conventional passive cameras, which capture high-resolution (HR) intensity images at a lower frame rate. However, 3D reconstruction from SPAD data faces challenges. Aggregating multiple binary measurements improves precision and reduces noise but can cause motion blur in dynamic scenes. Additionally, SPAD arrays often have lower resolution than passive cameras. To address these issues, we propose a novel computational imaging algorithm to improve the 3D reconstruction of moving scenes from SPAD data by addressing the motion blur and increasing the native spatial resolution. We adopt a plug-and-play approach within an optimization scheme alternating between guided video super-resolution of the 3D scene, and precise image realignment using optical flow. Experiments on synthetic data show significantly improved image resolutions across various signal-to-noise ratios and photon levels. We validate our method using real-world SPAD measurements on three practical situations with dynamic objects. First on fast-moving scenes in laboratory conditions at short range; second very low resolution imaging of people with a consumer-grade SPAD sensor from STMicroelectronics; and finally, HR imaging of people walking outdoors in daylight at a range of 325 meters under eye-safe illumination conditions using a short-wave infrared SPAD camera. These results demonstrate the robustness and versatility of our approach.

Paper Structure

This paper contains 15 sections, 6 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: Overview of the proposed approach and results.
  • Figure 2: Proposed method. From top to bottom: The scene is simultaneously recorded using using a single-photon LiDAR camera and a passive camera. The passive camera operates at a lower frame rate than the SPAD sensor, resulting in $M$ SPAD frames ${\boldsymbol{H}}^{j,t}$ with j $\in [1, M]$ between two consecutive intensity frames $R_{t-1}$ and $R_{t}$. The proposed plug-and-play approach iteratively performs three steps: first, an alignment step based on the motion $m^t$ estimated between time $t-1$ and $t$; second, a denoising step merging the aligned frames into a histogram; and third, a video super-resolution step operating on six low-resolution histograms to produce six high-resolution depth images, where ${\boldsymbol{D}}^t$ represents the $t$th map.
  • Figure 3: Results on simulated data for different movement speeds.To simulate motion, we artificially shift the entire image in all xyz directions across frames. (a) presents the results for a small movement speed of 0.1 pixels per binary frame (equivalent to 0.21 cm) along the x and y axes, leading to a total displacement of 21 cm between two intensity images. In the depth direction, the movement speed is 0.1 bins per binary frame, corresponding to a depth change of 5 cm between two intensity images. A video representation of this scenario is provided in Visualisation 1. (b) shows the results for double the movement speed, with a displacement of 42 cm in the x and y axes, and 10 cm in the z-axis between two intensity images. The simulations were conducted with an SBR of 16 and a ppp of 64.
  • Figure 4: Results on simulated data for different noise conditions. On the left, we display the intensity image alongside the high-resolution ground truth depth map used as a reference. On the right, we present the results for three different noise conditions: the top row corresponds to good noise levels with an SBR of 16 and 64 photons per pixel (ppp); the middle row shows the results for a sparse regime with SBR = 256 and ppp = 4; and the last row displays results for a noisy situation with SBR = 4 and ppp = 64. The simulations are performed with a movement speed that results in a displacement of 21 cm in the x and y directions, and 5 cm in depth, between consecutive intensity images.
  • Figure 5: Evaluation metrics for simulated data across different noise levels. From left to right, we present plots of the evaluation metrics across different noise levels. The y-axis represents the SBR, ranging from $2^2$ to $2^{10}$, and on the x-axis, the ppp varies from $2^{-1}$ to $2^6$. The first two plots show the percentage of correct pixels for thresholds of 5 cm and 3 cm. The third plot displays the Root Mean Square Error (RMSE) in centimeters. The simulations are conducted with a movement speed corresponding to a displacement of 21 cm along the x and y axes, and 5 cm in depth between two RGB images.
  • ...and 5 more figures