Scalable FPGA Framework for Real-Time Denoising in High-Throughput Imaging: A DRAM-Optimized Pipeline using High-Level Synthesis
Weichien Liao
TL;DR
The paper tackles real-time denoising of PRISM-scale high-throughput imaging data by proposing a DRAM-optimized FPGA preprocessing pipeline implemented with Vitis HLS. It introduces three subtraction-and-averaging algorithms and burst-mode DRAM access, including a running-sum accumulation to minimize DRAM traffic and maintain latency below the inter-frame interval $57~\mu s$. The approach is validated on a Kintex UltraScale board with 2 GB DRAM and a high-speed Phantom S710, demonstrating sustained real-time throughput and favorable data reduction compared to CPU/GPU workflows, thanks to inline processing within the acquisition path. The results show scalability to multi-bank and multi-FPGA configurations, enabling inline preprocessing in broader high-throughput imaging pipelines for spectroscopy and microscopy.
Abstract
High-throughput imaging workflows, such as Parallel Rapid Imaging with Spectroscopic Mapping (PRISM), generate data at rates that exceed conventional real-time processing capabilities. We present a scalable FPGA-based preprocessing pipeline for real-time denoising, implemented via High-Level Synthesis (HLS) and optimized for DRAM-backed buffering. Our architecture performs frame subtraction and averaging directly on streamed image data, minimizing latency through burst-mode AXI4 interfaces. The resulting kernel operates below the inter-frame interval, enabling inline denoising and reducing dataset size for downstream CPU/GPU analysis. Validated under PRISM-scale acquisition, this modular FPGA framework offers a practical solution for latency-sensitive imaging workflows in spectroscopy and microscopy.
