Fast Generation of Custom Floating-Point Spatial Filters on FPGAs
Nelson Campos, Eran Edirisinghe, Salva Chesnokov, Daniel Larkin
TL;DR
This work introduces FPGA-based implementations of both linear and nonlinear spatial filters using custom floating-point arithmetic to balance precision and hardware footprint, enabling real-time 1080p60 video on a low-cost FPGA. It details a streaming window architecture with line buffers, supports 3×3 and 5×5 convolutions, median filtering, and generic nonlinear filters defined by a modular function, all implemented with pipelined custom FP cores. A domain-specific language is presented to autogenerate SystemVerilog for custom FP blocks, demonstrated with examples including a nonlinear filter and a 3×3 convolution, and it automatically handles latency alignment across dependent operations. Hardware results show competitive throughputs (e.g., 60 FPS at 1080p for 3×3 conv on Zybo Z7-20) and favorable resource usage compared to fixed-point or HLS approaches, highlighting the DSL’s potential for rapid prototyping of real-time image/video processing pipelines.
Abstract
Convolutional Neural Networks (CNNs) have been utilised in many image and video processing applications. The convolution operator, also known as a spatial filter, is usually a linear operation, but this linearity compromises essential features and details inherent in the non-linearity present in many applications. However, due to its slow processing, the use of a nonlinear spatial filter is a significant bottleneck in many software applications. Further, due to their complexity, they are difficult to accelerate in FPGA or VLSI architectures. This paper presents novel FPGA implementations of linear and nonlinear spatial filters. More specifically, the arithmetic computations are carried out in custom floating-point, enabling a tradeoff of precision and hardware compactness, reducing algorithm development time. Further, we show that it is possible to process video at a resolution of 1080p with a frame rate of 60 frames per second, using a low-cost FPGA board. Finally, we show that using a domain-specific language will allow the rapid prototyping of image processing algorithms in custom floating-point arithmetic, allowing non-experts to quickly develop real-time video processing applications.
