Table of Contents
Fetching ...

Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging

Ismail Erbas, Vikas Pandey, Aporva Amarnath, Naigang Wang, Karthik Swaminathan, Stefan T. Radev, Xavier Intes

TL;DR

This work tackles the bottleneck of real-time fluorescence lifetime imaging (FLI) parameter extraction on FPGA hardware by compressing GRU-based Seq2Seq models. It employs a combination of weight reduction, post-training quantization (PTQ), quantization-aware training (QAT), and knowledge distillation (KD) to derive a hardware-friendly Seq2SeqLite that preserves accuracy. Notably, KD reduces model parameters by about 98% while maintaining performance, and 8-bit quantization further improves efficiency, enabling concurrent real-time FLI analysis during data capture. The study identifies the 32 × 32 Seq2SeqLite model with KD as a particularly effective configuration, offering a practical path toward FPGA-based, real-time FLI analysis in clinical and research settings.

Abstract

Fluorescence lifetime imaging (FLI) is an important technique for studying cellular environments and molecular interactions, but its real-time application is limited by slow data acquisition, which requires capturing large time-resolved images and complex post-processing using iterative fitting algorithms. Deep learning (DL) models enable real-time inference, but can be computationally demanding due to complex architectures and large matrix operations. This makes DL models ill-suited for direct implementation on field-programmable gate array (FPGA)-based camera hardware. Model compression is thus crucial for practical deployment for real-time inference generation. In this work, we focus on compressing recurrent neural networks (RNNs), which are well-suited for FLI time-series data processing, to enable deployment on resource-constrained FPGA boards. We perform an empirical evaluation of various compression techniques, including weight reduction, knowledge distillation (KD), post-training quantization (PTQ), and quantization-aware training (QAT), to reduce model size and computational load while preserving inference accuracy. Our compressed RNN model, Seq2SeqLite, achieves a balance between computational efficiency and prediction accuracy, particularly at 8-bit precision. By applying KD, the model parameter size was reduced by 98\% while retaining performance, making it suitable for concurrent real-time FLI analysis on FPGA during data capture. This work represents a big step towards integrating hardware-accelerated real-time FLI analysis for fast biological processes.

Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging

TL;DR

This work tackles the bottleneck of real-time fluorescence lifetime imaging (FLI) parameter extraction on FPGA hardware by compressing GRU-based Seq2Seq models. It employs a combination of weight reduction, post-training quantization (PTQ), quantization-aware training (QAT), and knowledge distillation (KD) to derive a hardware-friendly Seq2SeqLite that preserves accuracy. Notably, KD reduces model parameters by about 98% while maintaining performance, and 8-bit quantization further improves efficiency, enabling concurrent real-time FLI analysis during data capture. The study identifies the 32 × 32 Seq2SeqLite model with KD as a particularly effective configuration, offering a practical path toward FPGA-based, real-time FLI analysis in clinical and research settings.

Abstract

Fluorescence lifetime imaging (FLI) is an important technique for studying cellular environments and molecular interactions, but its real-time application is limited by slow data acquisition, which requires capturing large time-resolved images and complex post-processing using iterative fitting algorithms. Deep learning (DL) models enable real-time inference, but can be computationally demanding due to complex architectures and large matrix operations. This makes DL models ill-suited for direct implementation on field-programmable gate array (FPGA)-based camera hardware. Model compression is thus crucial for practical deployment for real-time inference generation. In this work, we focus on compressing recurrent neural networks (RNNs), which are well-suited for FLI time-series data processing, to enable deployment on resource-constrained FPGA boards. We perform an empirical evaluation of various compression techniques, including weight reduction, knowledge distillation (KD), post-training quantization (PTQ), and quantization-aware training (QAT), to reduce model size and computational load while preserving inference accuracy. Our compressed RNN model, Seq2SeqLite, achieves a balance between computational efficiency and prediction accuracy, particularly at 8-bit precision. By applying KD, the model parameter size was reduced by 98\% while retaining performance, making it suitable for concurrent real-time FLI analysis on FPGA during data capture. This work represents a big step towards integrating hardware-accelerated real-time FLI analysis for fast biological processes.
Paper Structure (13 sections, 1 equation, 2 figures, 3 tables)

This paper contains 13 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Experimental set-up and data. Schematic illustration of fluorescence lifetime imaging (FLI), time-resolved data capture, and the temporal point spread function (TPSF). The top-right panel shows experimental time-resolved fluorescence images of HER2+ tumor xenografts labeled with Alexa Fluor 700 conjugated to Trastuzumab in a nude mouse, captured at different time gates. The bottom-right panel presents the corresponding TPSF for a single pixel.
  • Figure 2: Model and training setup. (a) Gated Recurrent Unit (GRU); (b) Time-resolved fluorescence images; (c) Deep GRU-based encoder-decoder architecture (teacher), trained for TPSF deconvolution to pixel-wise SFD from (b), with the resulting deconvolved SFDs shown in (e); (f) Single-layer encoder-decoder RNN model (Student), derived from (c) using the knowledge distillation (KD) method. The stack of time-resolved fluorescence images (b) and deconvolved SFDs (e) are used to train (f). The student model learns hidden features using a combined loss function.