Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging

Ismail Erbas; Vikas Pandey; Aporva Amarnath; Naigang Wang; Karthik Swaminathan; Stefan T. Radev; Xavier Intes

Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging

Ismail Erbas, Vikas Pandey, Aporva Amarnath, Naigang Wang, Karthik Swaminathan, Stefan T. Radev, Xavier Intes

TL;DR

This work tackles the bottleneck of real-time fluorescence lifetime imaging (FLI) parameter extraction on FPGA hardware by compressing GRU-based Seq2Seq models. It employs a combination of weight reduction, post-training quantization (PTQ), quantization-aware training (QAT), and knowledge distillation (KD) to derive a hardware-friendly Seq2SeqLite that preserves accuracy. Notably, KD reduces model parameters by about 98% while maintaining performance, and 8-bit quantization further improves efficiency, enabling concurrent real-time FLI analysis during data capture. The study identifies the 32 × 32 Seq2SeqLite model with KD as a particularly effective configuration, offering a practical path toward FPGA-based, real-time FLI analysis in clinical and research settings.

Abstract

Fluorescence lifetime imaging (FLI) is an important technique for studying cellular environments and molecular interactions, but its real-time application is limited by slow data acquisition, which requires capturing large time-resolved images and complex post-processing using iterative fitting algorithms. Deep learning (DL) models enable real-time inference, but can be computationally demanding due to complex architectures and large matrix operations. This makes DL models ill-suited for direct implementation on field-programmable gate array (FPGA)-based camera hardware. Model compression is thus crucial for practical deployment for real-time inference generation. In this work, we focus on compressing recurrent neural networks (RNNs), which are well-suited for FLI time-series data processing, to enable deployment on resource-constrained FPGA boards. We perform an empirical evaluation of various compression techniques, including weight reduction, knowledge distillation (KD), post-training quantization (PTQ), and quantization-aware training (QAT), to reduce model size and computational load while preserving inference accuracy. Our compressed RNN model, Seq2SeqLite, achieves a balance between computational efficiency and prediction accuracy, particularly at 8-bit precision. By applying KD, the model parameter size was reduced by 98\% while retaining performance, making it suitable for concurrent real-time FLI analysis on FPGA during data capture. This work represents a big step towards integrating hardware-accelerated real-time FLI analysis for fast biological processes.

Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging

TL;DR

Abstract

Paper Structure (13 sections, 1 equation, 2 figures, 3 tables)

This paper contains 13 sections, 1 equation, 2 figures, 3 tables.

Introduction
Background
Methods
Synthetic data
Experimental data
Model development and optimization
Seq2Seq model
Weight reduction and model quantization
Knowledge distillation
Empirical Evaluation
Weight reduction results with Seq2Seq model
Quantization results
Conclusion

Figures (2)

Figure 1: Experimental set-up and data. Schematic illustration of fluorescence lifetime imaging (FLI), time-resolved data capture, and the temporal point spread function (TPSF). The top-right panel shows experimental time-resolved fluorescence images of HER2+ tumor xenografts labeled with Alexa Fluor 700 conjugated to Trastuzumab in a nude mouse, captured at different time gates. The bottom-right panel presents the corresponding TPSF for a single pixel.
Figure 2: Model and training setup. (a) Gated Recurrent Unit (GRU); (b) Time-resolved fluorescence images; (c) Deep GRU-based encoder-decoder architecture (teacher), trained for TPSF deconvolution to pixel-wise SFD from (b), with the resulting deconvolved SFDs shown in (e); (f) Single-layer encoder-decoder RNN model (Student), derived from (c) using the knowledge distillation (KD) method. The stack of time-resolved fluorescence images (b) and deconvolved SFDs (e) are used to train (f). The student model learns hidden features using a combined loss function.

Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging

TL;DR

Abstract

Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging

Authors

TL;DR

Abstract

Table of Contents

Figures (2)