Table of Contents
Fetching ...

Towards On-Device Learning and Reconfigurable Hardware Implementation for Encoded Single-Photon Signal Processing

Zhenya Zang, Xingda Li, David Day Uei Li

TL;DR

This work targets on-device learning for time-resolved single-photon signal processing by introducing OSOS-ELM, an online sequential extreme learning machine augmented with a One-Sided Jacobi rotation-based SVD to efficiently compute the Moore–Penrose inverse during training. Implemented on a Xilinx ZCU104 and evaluated on FLIM, DCS, and LiDAR datasets, OSOS-ELM achieves accuracy comparable to software SVD-based benchmarks while delivering superior hardware efficiency and parallelism. The architecture splits training between the PS (initial and one-batch training via $OJR$-$SVD$) and PL (throughput-optimized MVM/MMM), enabling real-time on-device learning with tunable trade-offs between latency, precision, and energy, and it is also assessed on a Jetson Xavier NX to explore heterogenous computing options. Overall, the study demonstrates a scalable hardware-software co-design for online learning in encoded single-photon sensing, with potential to reduce data transfer, latency, and privacy concerns in edge photonics applications.

Abstract

Deep neural networks (DNNs) enhance the accuracy and efficiency of reconstructing key parameters from time-resolved photon arrival signals recorded by single-photon detectors. However, the performance of conventional backpropagation-based DNNs is highly dependent on various parameters of the optical setup and biological samples under examination, necessitating frequent network retraining, either through transfer learning or from scratch. Newly collected data must also be stored and transferred to a high-performance GPU server for retraining, introducing latency and storage overhead. To address these challenges, we propose an online training algorithm based on a One-Sided Jacobi rotation-based Online Sequential Extreme Learning Machine (OSOS-ELM). We fully exploit parallelism in executing OSOS-ELM on a heterogeneous FPGA with integrated ARM cores. Extensive evaluations of OSOS-ELM and OSELM demonstrate that both achieve comparable accuracy across different network dimensions (i.e., input, hidden, and output layers), while OSOS-ELM proves to be more hardware-efficient. By leveraging the parallelism of OSOS-ELM, we implement a holistic computing prototype on a Xilinx ZCU104 FPGA, which integrates a multi-core CPU and programmable logic fabric. We validate our approach through three case studies involving single-photon signal analysis: sensing through fog using commercial single-photon LiDAR, fluorescence lifetime estimation in FLIM, and blood flow index reconstruction in DCS, all utilizing one-dimensional data encoded from photonic signals. From a hardware perspective, we optimize the OSOS-ELM workload by employing multi-tasked processing on ARM CPU cores and pipelined execution on the FPGA's logic fabric. We also implement our OSOS-ELM on the NVIDIA Jetson Xavier NX GPU to comprehensively investigate its computing performance on another type of heterogeneous computing platform.

Towards On-Device Learning and Reconfigurable Hardware Implementation for Encoded Single-Photon Signal Processing

TL;DR

This work targets on-device learning for time-resolved single-photon signal processing by introducing OSOS-ELM, an online sequential extreme learning machine augmented with a One-Sided Jacobi rotation-based SVD to efficiently compute the Moore–Penrose inverse during training. Implemented on a Xilinx ZCU104 and evaluated on FLIM, DCS, and LiDAR datasets, OSOS-ELM achieves accuracy comparable to software SVD-based benchmarks while delivering superior hardware efficiency and parallelism. The architecture splits training between the PS (initial and one-batch training via -) and PL (throughput-optimized MVM/MMM), enabling real-time on-device learning with tunable trade-offs between latency, precision, and energy, and it is also assessed on a Jetson Xavier NX to explore heterogenous computing options. Overall, the study demonstrates a scalable hardware-software co-design for online learning in encoded single-photon sensing, with potential to reduce data transfer, latency, and privacy concerns in edge photonics applications.

Abstract

Deep neural networks (DNNs) enhance the accuracy and efficiency of reconstructing key parameters from time-resolved photon arrival signals recorded by single-photon detectors. However, the performance of conventional backpropagation-based DNNs is highly dependent on various parameters of the optical setup and biological samples under examination, necessitating frequent network retraining, either through transfer learning or from scratch. Newly collected data must also be stored and transferred to a high-performance GPU server for retraining, introducing latency and storage overhead. To address these challenges, we propose an online training algorithm based on a One-Sided Jacobi rotation-based Online Sequential Extreme Learning Machine (OSOS-ELM). We fully exploit parallelism in executing OSOS-ELM on a heterogeneous FPGA with integrated ARM cores. Extensive evaluations of OSOS-ELM and OSELM demonstrate that both achieve comparable accuracy across different network dimensions (i.e., input, hidden, and output layers), while OSOS-ELM proves to be more hardware-efficient. By leveraging the parallelism of OSOS-ELM, we implement a holistic computing prototype on a Xilinx ZCU104 FPGA, which integrates a multi-core CPU and programmable logic fabric. We validate our approach through three case studies involving single-photon signal analysis: sensing through fog using commercial single-photon LiDAR, fluorescence lifetime estimation in FLIM, and blood flow index reconstruction in DCS, all utilizing one-dimensional data encoded from photonic signals. From a hardware perspective, we optimize the OSOS-ELM workload by employing multi-tasked processing on ARM CPU cores and pipelined execution on the FPGA's logic fabric. We also implement our OSOS-ELM on the NVIDIA Jetson Xavier NX GPU to comprehensively investigate its computing performance on another type of heterogeneous computing platform.

Paper Structure

This paper contains 15 sections, 9 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Dataflow of data acquisition and processing for FLIM, DCS, and LiDAR: (a) Optical instruments, where histogram encoding and auto-correlator modules are integrated into the optical systems for single-photon LiDAR, FLIM, and DCS, along with standard single-photon detectors (using SPADs as an example) and timer modules. Additionally, offline training pipelines for FLIM and DCS are included. $\tau$, $c$, and BFi indicate object class, fluorescence lifetime, and blood flow index. (b) FPGA-based ELM online training. (c) A flowchart illustrating the process for handling newly acquired data from a new experimental system or with new optical parameters. (d) OSOS-ELM training and inference dataflow (the topologies between OSOS-ELM and ELM are the same).
  • Figure 2: Examples of FLIM decays with (a) two lifetime components and (b) different laser FWHM. ACFs (clean and noise-applied) were generated using the same absorption and scattering coefficients and source-detector distance but with (c) different total averaging times ($t_a$) and (d) varying photon intensity. (e) and (f) also show four example histograms with normalized photon counts collected from a mannequin in a chamber with heterogeneous fog, labeled as class 1 and class 2, respectively.
  • Figure 3: Software evaluation of accuracy with different lengths of fractional bits in FXP format for (a) $\tau_A$ and $\tau_I$ of FLIM, and (b) BFi and $\beta$ of DCS. Blue and red dashed lines indicate the reference accuracy obtained by FLP.
  • Figure 4: Overview of the hardware architecture. (a) Enabled CPU cores (orange), functions computing IT, and instantiated hardware APIs in PS; . Three IP cores, i.e., data loading module, training module, and inference module; (c) Memory segmentation and addresses for storing matrices and flag signals.
  • Figure 5: Evaluation of object classification using LiDAR histograms from OSOS-ELM. (a) Confusion matrices and average accuracy. (b) AUC scores and ROC curves for each class.
  • ...and 3 more figures