Table of Contents
Fetching ...

Fast Generation of Custom Floating-Point Spatial Filters on FPGAs

Nelson Campos, Eran Edirisinghe, Salva Chesnokov, Daniel Larkin

TL;DR

This work introduces FPGA-based implementations of both linear and nonlinear spatial filters using custom floating-point arithmetic to balance precision and hardware footprint, enabling real-time 1080p60 video on a low-cost FPGA. It details a streaming window architecture with line buffers, supports 3×3 and 5×5 convolutions, median filtering, and generic nonlinear filters defined by a modular function, all implemented with pipelined custom FP cores. A domain-specific language is presented to autogenerate SystemVerilog for custom FP blocks, demonstrated with examples including a nonlinear filter and a 3×3 convolution, and it automatically handles latency alignment across dependent operations. Hardware results show competitive throughputs (e.g., 60 FPS at 1080p for 3×3 conv on Zybo Z7-20) and favorable resource usage compared to fixed-point or HLS approaches, highlighting the DSL’s potential for rapid prototyping of real-time image/video processing pipelines.

Abstract

Convolutional Neural Networks (CNNs) have been utilised in many image and video processing applications. The convolution operator, also known as a spatial filter, is usually a linear operation, but this linearity compromises essential features and details inherent in the non-linearity present in many applications. However, due to its slow processing, the use of a nonlinear spatial filter is a significant bottleneck in many software applications. Further, due to their complexity, they are difficult to accelerate in FPGA or VLSI architectures. This paper presents novel FPGA implementations of linear and nonlinear spatial filters. More specifically, the arithmetic computations are carried out in custom floating-point, enabling a tradeoff of precision and hardware compactness, reducing algorithm development time. Further, we show that it is possible to process video at a resolution of 1080p with a frame rate of 60 frames per second, using a low-cost FPGA board. Finally, we show that using a domain-specific language will allow the rapid prototyping of image processing algorithms in custom floating-point arithmetic, allowing non-experts to quickly develop real-time video processing applications.

Fast Generation of Custom Floating-Point Spatial Filters on FPGAs

TL;DR

This work introduces FPGA-based implementations of both linear and nonlinear spatial filters using custom floating-point arithmetic to balance precision and hardware footprint, enabling real-time 1080p60 video on a low-cost FPGA. It details a streaming window architecture with line buffers, supports 3×3 and 5×5 convolutions, median filtering, and generic nonlinear filters defined by a modular function, all implemented with pipelined custom FP cores. A domain-specific language is presented to autogenerate SystemVerilog for custom FP blocks, demonstrated with examples including a nonlinear filter and a 3×3 convolution, and it automatically handles latency alignment across dependent operations. Hardware results show competitive throughputs (e.g., 60 FPS at 1080p for 3×3 conv on Zybo Z7-20) and favorable resource usage compared to fixed-point or HLS approaches, highlighting the DSL’s potential for rapid prototyping of real-time image/video processing pipelines.

Abstract

Convolutional Neural Networks (CNNs) have been utilised in many image and video processing applications. The convolution operator, also known as a spatial filter, is usually a linear operation, but this linearity compromises essential features and details inherent in the non-linearity present in many applications. However, due to its slow processing, the use of a nonlinear spatial filter is a significant bottleneck in many software applications. Further, due to their complexity, they are difficult to accelerate in FPGA or VLSI architectures. This paper presents novel FPGA implementations of linear and nonlinear spatial filters. More specifically, the arithmetic computations are carried out in custom floating-point, enabling a tradeoff of precision and hardware compactness, reducing algorithm development time. Further, we show that it is possible to process video at a resolution of 1080p with a frame rate of 60 frames per second, using a low-cost FPGA board. Finally, we show that using a domain-specific language will allow the rapid prototyping of image processing algorithms in custom floating-point arithmetic, allowing non-experts to quickly develop real-time video processing applications.
Paper Structure (12 sections, 3 equations, 16 figures, 1 table)

This paper contains 12 sections, 3 equations, 16 figures, 1 table.

Figures (16)

  • Figure 1: Window generation structure for a filter of dimension $3\times 3$
  • Figure 2: Window generation structure for a filter of dimension $5\times 5$
  • Figure 3: Line buffer memory makes inference of dual-port RAM
  • Figure 4: 2D convolution with kernel dimensions 3x3
  • Figure 5: A 3-stage pipeline adder tree of eight inputs
  • ...and 11 more figures