Table of Contents
Fetching ...

Scalable FPGA Framework for Real-Time Denoising in High-Throughput Imaging: A DRAM-Optimized Pipeline using High-Level Synthesis

Weichien Liao

TL;DR

The paper tackles real-time denoising of PRISM-scale high-throughput imaging data by proposing a DRAM-optimized FPGA preprocessing pipeline implemented with Vitis HLS. It introduces three subtraction-and-averaging algorithms and burst-mode DRAM access, including a running-sum accumulation to minimize DRAM traffic and maintain latency below the inter-frame interval $57~\mu s$. The approach is validated on a Kintex UltraScale board with 2 GB DRAM and a high-speed Phantom S710, demonstrating sustained real-time throughput and favorable data reduction compared to CPU/GPU workflows, thanks to inline processing within the acquisition path. The results show scalability to multi-bank and multi-FPGA configurations, enabling inline preprocessing in broader high-throughput imaging pipelines for spectroscopy and microscopy.

Abstract

High-throughput imaging workflows, such as Parallel Rapid Imaging with Spectroscopic Mapping (PRISM), generate data at rates that exceed conventional real-time processing capabilities. We present a scalable FPGA-based preprocessing pipeline for real-time denoising, implemented via High-Level Synthesis (HLS) and optimized for DRAM-backed buffering. Our architecture performs frame subtraction and averaging directly on streamed image data, minimizing latency through burst-mode AXI4 interfaces. The resulting kernel operates below the inter-frame interval, enabling inline denoising and reducing dataset size for downstream CPU/GPU analysis. Validated under PRISM-scale acquisition, this modular FPGA framework offers a practical solution for latency-sensitive imaging workflows in spectroscopy and microscopy.

Scalable FPGA Framework for Real-Time Denoising in High-Throughput Imaging: A DRAM-Optimized Pipeline using High-Level Synthesis

TL;DR

The paper tackles real-time denoising of PRISM-scale high-throughput imaging data by proposing a DRAM-optimized FPGA preprocessing pipeline implemented with Vitis HLS. It introduces three subtraction-and-averaging algorithms and burst-mode DRAM access, including a running-sum accumulation to minimize DRAM traffic and maintain latency below the inter-frame interval . The approach is validated on a Kintex UltraScale board with 2 GB DRAM and a high-speed Phantom S710, demonstrating sustained real-time throughput and favorable data reduction compared to CPU/GPU workflows, thanks to inline processing within the acquisition path. The results show scalability to multi-bank and multi-FPGA configurations, enabling inline preprocessing in broader high-throughput imaging pipelines for spectroscopy and microscopy.

Abstract

High-throughput imaging workflows, such as Parallel Rapid Imaging with Spectroscopic Mapping (PRISM), generate data at rates that exceed conventional real-time processing capabilities. We present a scalable FPGA-based preprocessing pipeline for real-time denoising, implemented via High-Level Synthesis (HLS) and optimized for DRAM-backed buffering. Our architecture performs frame subtraction and averaging directly on streamed image data, minimizing latency through burst-mode AXI4 interfaces. The resulting kernel operates below the inter-frame interval, enabling inline denoising and reducing dataset size for downstream CPU/GPU analysis. Validated under PRISM-scale acquisition, this modular FPGA framework offers a practical solution for latency-sensitive imaging workflows in spectroscopy and microscopy.

Paper Structure

This paper contains 10 sections, 13 equations, 8 figures, 10 tables, 3 algorithms.

Figures (8)

  • Figure 1: Schematic overview of the PRISM experimental setup used for ultrafast spectroscopic imaging. BS: Beam Splitter; PD: Photodiode; CM: Curved Mirror; OBJ: Objective; SP: Short pass Filter; TL: Tube Lens. The diagram illustrates beam paths, optical elements, and synchronization components, including the collinear pump–probe configuration and detection system interfaced with a high-speed camera.
  • Figure 2: Frame-wise subtraction and averaging schema used during PRISM preprocessing. For each experiment, sequential scans generate alternating excitation and control frames. Subtraction between consecutive frames isolates excitation-induced signals, which are subsequently averaged across all groups to suppress random noise and enhance signal fidelity. Each image result is computed as the groupwise average of difference frames and represents the final denoised output.
  • Figure 3: Dataflow diagram for the CustomLogic FPGA module integrated within the Coaxlink Octo frame grabber. Incoming pixel data are streamed through the CoaXPress interface into memory controller and then forwarded to the CustonLogic region. Preprocessing kernels perform real-time pixel subtraction and averaging, with intermediate data buffered in DDR4 via the AXI4 protocol and final results routed to host memory via PCIe.
  • Figure 4: Dataflow illustration for Algorithm \ref{['alg:sub-avg-3']}, highlighting optimized burst-mode interactions with on-board DRAM. Incoming image frames undergo pixel-wise subtraction and incremental accumulation, with results written to DRAM as a running sum. Read and write operations are burst-enabled through the AXI4 interface, improving throughput and minimizing latency. The diagram outlines the sequence of memory transactions and accumulation logic across frame groups, culminating in the final averaged output.
  • Figure 5: Experimental hardware setup for PRISM emulation, integrating FPGA-based preprocessing within the acquisition pipeline. A Phantom S710 high-speed camera captures dynamic screen patterns illuminated by two LEDs---one modulated to simulate transient excitation, the other static as background noise. Image data are streamed via CoaXPress into the Coaxlink Octo frame grabber, where an embedded Xilinx FPGA performs real-time denoising before host transfer. The configuration mimics excitation-driven workflows and validates low-latency performance under realistic operating conditions.
  • ...and 3 more figures