Table of Contents
Fetching ...

On-Device Super Resolution Imaging Using Low-Cost SPAD Array and Embedded Lightweight Deep Learning

Zhenya Zang, Xingda Li, David Day Uei Li

Abstract

This work presents a lightweight super-resolution (LiteSR) neural network for depth and intensity images acquired from a consumer-grade single-photon avalanche diode (SPAD) array with a 48x32 spatial resolution. The proposed framework reconstructs high-resolution (HR) images of size 256x256. Both synthetic and real datasets are used for performance evaluation. Extensive quantitative metrics demonstrate high reconstruction fidelity on synthetic datasets, while experiments on real indoor and outdoor measurements further confirm the robustness of the proposed approach. Moreover, the SPAD sensor is interfaced with an Arduino UNO Q microcontroller, which receives low-resolution (LR) depth and intensity images and feeds them into a compressed, pre-trained deep learning (DL) model, enabling real-time SR video streaming. In addition to the 256x256 setting, a range of target HR resolutions is evaluated to determine the maximum achievable upscaling resolution (512x512) with LiteSR, including scenarios with noise-corrupted LR inputs. The proposed LiteSR-embedded system co-design provides a scalable, cost-effective solution to enhance the spatial resolution of current consumer-grade SPAD arrays to meet HR imaging requirements.

On-Device Super Resolution Imaging Using Low-Cost SPAD Array and Embedded Lightweight Deep Learning

Abstract

This work presents a lightweight super-resolution (LiteSR) neural network for depth and intensity images acquired from a consumer-grade single-photon avalanche diode (SPAD) array with a 48x32 spatial resolution. The proposed framework reconstructs high-resolution (HR) images of size 256x256. Both synthetic and real datasets are used for performance evaluation. Extensive quantitative metrics demonstrate high reconstruction fidelity on synthetic datasets, while experiments on real indoor and outdoor measurements further confirm the robustness of the proposed approach. Moreover, the SPAD sensor is interfaced with an Arduino UNO Q microcontroller, which receives low-resolution (LR) depth and intensity images and feeds them into a compressed, pre-trained deep learning (DL) model, enabling real-time SR video streaming. In addition to the 256x256 setting, a range of target HR resolutions is evaluated to determine the maximum achievable upscaling resolution (512x512) with LiteSR, including scenarios with noise-corrupted LR inputs. The proposed LiteSR-embedded system co-design provides a scalable, cost-effective solution to enhance the spatial resolution of current consumer-grade SPAD arrays to meet HR imaging requirements.

Paper Structure

This paper contains 10 sections, 5 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: LiteSR architecture, composed of intensity-guided depth and intensity branches. The numbers of channels for the EARB, RLFB, Depth Head, and Intensity Head are denoted by CH_E, CH_R, CH_DH, and CH_IH.
  • Figure 2: Comparison of super-resolution results on intensity test images for $4\times$ upscaling. From left to right: GT, LR, bicubic interpolation, LiteSR (FP32), and LiteSR (INT8). Quantitative metrics (MS-SSIM, PSNR, and GMSD) are reported for each reconstructed image. The input LR image has dimensions 32x48, resulting in different aspect ratios for the output HR images.
  • Figure 3: Comparison of super-resolution results on depth test images for $4\times$ upscaling. From left to right: GT, LR, bicubic interpolation, LiteSR (FP32), and LiteSR (INT8). The Input LR image has dimensions 32x48, resulting in different aspect ratios for the output HR images.
  • Figure 4: Hot pixel detection and compensation in photon counting mode using a white flat target. (a) Raw mean intensity map showing spatially distributed hot pixels with anomalously high DCRs. (b) Binary hot-pixel mask generated by thresholding at 250 photon counts. (c) Corrected intensity image obtained by replacing hot pixels with the local $3\times3$ neighborhood average.
  • Figure 5: SR results on real-world intensity captures. RGB views of the scenes are shown together with bicubic, LiteSR (FP32), and LiteSR (INT8) reconstructions and their LV scores.
  • ...and 5 more figures