Towards On-Device Learning and Reconfigurable Hardware Implementation for Encoded Single-Photon Signal Processing
Zhenya Zang, Xingda Li, David Day Uei Li
TL;DR
This work targets on-device learning for time-resolved single-photon signal processing by introducing OSOS-ELM, an online sequential extreme learning machine augmented with a One-Sided Jacobi rotation-based SVD to efficiently compute the Moore–Penrose inverse during training. Implemented on a Xilinx ZCU104 and evaluated on FLIM, DCS, and LiDAR datasets, OSOS-ELM achieves accuracy comparable to software SVD-based benchmarks while delivering superior hardware efficiency and parallelism. The architecture splits training between the PS (initial and one-batch training via $OJR$-$SVD$) and PL (throughput-optimized MVM/MMM), enabling real-time on-device learning with tunable trade-offs between latency, precision, and energy, and it is also assessed on a Jetson Xavier NX to explore heterogenous computing options. Overall, the study demonstrates a scalable hardware-software co-design for online learning in encoded single-photon sensing, with potential to reduce data transfer, latency, and privacy concerns in edge photonics applications.
Abstract
Deep neural networks (DNNs) enhance the accuracy and efficiency of reconstructing key parameters from time-resolved photon arrival signals recorded by single-photon detectors. However, the performance of conventional backpropagation-based DNNs is highly dependent on various parameters of the optical setup and biological samples under examination, necessitating frequent network retraining, either through transfer learning or from scratch. Newly collected data must also be stored and transferred to a high-performance GPU server for retraining, introducing latency and storage overhead. To address these challenges, we propose an online training algorithm based on a One-Sided Jacobi rotation-based Online Sequential Extreme Learning Machine (OSOS-ELM). We fully exploit parallelism in executing OSOS-ELM on a heterogeneous FPGA with integrated ARM cores. Extensive evaluations of OSOS-ELM and OSELM demonstrate that both achieve comparable accuracy across different network dimensions (i.e., input, hidden, and output layers), while OSOS-ELM proves to be more hardware-efficient. By leveraging the parallelism of OSOS-ELM, we implement a holistic computing prototype on a Xilinx ZCU104 FPGA, which integrates a multi-core CPU and programmable logic fabric. We validate our approach through three case studies involving single-photon signal analysis: sensing through fog using commercial single-photon LiDAR, fluorescence lifetime estimation in FLIM, and blood flow index reconstruction in DCS, all utilizing one-dimensional data encoded from photonic signals. From a hardware perspective, we optimize the OSOS-ELM workload by employing multi-tasked processing on ARM CPU cores and pipelined execution on the FPGA's logic fabric. We also implement our OSOS-ELM on the NVIDIA Jetson Xavier NX GPU to comprehensively investigate its computing performance on another type of heterogeneous computing platform.
