Table of Contents
Fetching ...

LogicSparse: Enabling Engine-Free Unstructured Sparsity for Quantised Deep-learning Accelerators

Changhong Li, Biswajit Basu, Shreejith Shanker

TL;DR

This work addresses the inefficiency of unstructured sparsity when deployed on quantised neural networks (QNNs) on resource-limited hardware. It introduces a framework that embeds unstructured sparsity into dataflow FPGA accelerators, eliminating the need for sparse engines, and pairs this with hardware-aware pruning and folding-aware design-space exploration to preserve parallelism. The methodology uses a design-space exploration loop that jointly optimises pruning and folding, enabling hardware–software co-design and Pareto-front progress. Experimental results on LeNet-5 demonstrate substantial compression and throughput gains at low LUT cost, underscoring practical viability for efficient QNN acceleration on FPGA-based edge devices.

Abstract

FPGAs have been shown to be a promising platform for deploying Quantised Neural Networks (QNNs) with high-speed, low-latency, and energy-efficient inference. However, the complexity of modern deep-learning models limits the performance on resource-constrained edge devices. While quantisation and pruning alleviate these challenges, unstructured sparsity remains underexploited due to irregular memory access. This work introduces a framework that embeds unstructured sparsity into dataflow accelerators, eliminating the need for dedicated sparse engines and preserving parallelism. A hardware-aware pruning strategy is introduced to improve efficiency and design flow further. On LeNet-5, the framework attains 51.6 x compression and 1.23 x throughput improvement using only 5.12% of LUTs, effectively exploiting unstructured sparsity for QNN acceleration.

LogicSparse: Enabling Engine-Free Unstructured Sparsity for Quantised Deep-learning Accelerators

TL;DR

This work addresses the inefficiency of unstructured sparsity when deployed on quantised neural networks (QNNs) on resource-limited hardware. It introduces a framework that embeds unstructured sparsity into dataflow FPGA accelerators, eliminating the need for sparse engines, and pairs this with hardware-aware pruning and folding-aware design-space exploration to preserve parallelism. The methodology uses a design-space exploration loop that jointly optimises pruning and folding, enabling hardware–software co-design and Pareto-front progress. Experimental results on LeNet-5 demonstrate substantial compression and throughput gains at low LUT cost, underscoring practical viability for efficient QNN acceleration on FPGA-based edge devices.

Abstract

FPGAs have been shown to be a promising platform for deploying Quantised Neural Networks (QNNs) with high-speed, low-latency, and energy-efficient inference. However, the complexity of modern deep-learning models limits the performance on resource-constrained edge devices. While quantisation and pruning alleviate these challenges, unstructured sparsity remains underexploited due to irregular memory access. This work introduces a framework that embeds unstructured sparsity into dataflow accelerators, eliminating the need for dedicated sparse engines and preserving parallelism. A hardware-aware pruning strategy is introduced to improve efficiency and design flow further. On LeNet-5, the framework attains 51.6 x compression and 1.23 x throughput improvement using only 5.12% of LUTs, effectively exploiting unstructured sparsity for QNN acceleration.

Paper Structure

This paper contains 3 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Workflow of automated pruning and folding decisions
  • Figure 2: Estimated latency and LUT utilization per layer of LeNet-5 under different folding and pruning strategies