NeuroBlend: Towards Low-Power yet Accurate Neural Network-Based Inference Engine Blending Binary and Fixed-Point Convolutions

Arash Fayyazi; Mahdi Nazemi; Arya Fayyazi; Massoud Pedram

NeuroBlend: Towards Low-Power yet Accurate Neural Network-Based Inference Engine Blending Binary and Fixed-Point Convolutions

Arash Fayyazi, Mahdi Nazemi, Arya Fayyazi, Massoud Pedram

TL;DR

NeuroBlend tackles the challenge of low-latency, low-power neural inference by blending binary and fixed-point convolutions within a novel Blend module and deploying a DSP-enabled FPGA accelerator. A supporting compiler enables efficient mapping and fusion of BN/CONV operations, while the hardware design exploits a heterogeneous architecture to maximize throughput and minimize power. Empirical results show NeuroBlend-20 achieves competitive CIFAR-10/100 accuracy with higher throughput than prior BNNs, and the BlendMixer extension to MLPMixer reduces memory footprint while preserving accuracy. Overall, the work demonstrates a practical co-design path for deploying accurate, resource-efficient neural networks on FPGA platforms with broad implications for edge AI and latency-critical applications.

Abstract

This paper introduces NeuroBlend, a novel neural network architecture featuring a unique building block known as the Blend module. This module incorporates binary and fixed-point convolutions in its main and skip paths, respectively. There is a judicious deployment of batch normalizations on both main and skip paths inside the Blend module and in between consecutive Blend modules. Additionally, we present a compiler and hardware architecture designed to map NeuroBlend models onto FPGA devices, aiming to minimize inference latency while maintaining high accuracy. Our NeuroBlend-20 (NeuroBlend-18) model, derived from ResNet-20 (ResNet-18) trained on CIFAR-10 (CIFAR-100), achieves 88.0\% (73.73\%) classification accuracy, outperforming state-of-the-art binary neural networks by 0.8\% (1.33\%), with an inference time of 0.38ms per image, 1.4x faster than previous FPGA implementation for BNNs. Similarly, our BlendMixer model for CIFAR-10 attains 90.6\% accuracy(1.59\% less than full precision MLPMixer), with a 3.5x reduction in model size compared to full precision MLPMixer. Furthermore, leveraging DSP blocks for 48-bit bitwise logic operations enables low-power FPGA implementation, yielding a 2.5x reduction in power consumption.

NeuroBlend: Towards Low-Power yet Accurate Neural Network-Based Inference Engine Blending Binary and Fixed-Point Convolutions

TL;DR

Abstract

Paper Structure (19 sections, 3 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 3 equations, 4 figures, 6 tables, 1 algorithm.

Introduction
Preliminaries
MLPMixer
Conventional BNN
Prior Work on State-of-the-art BNN Architectures
The Proposed Building Blocks
Proposed Accelerator Design
Compiler Optimization
Accelerator Architecture
BNN and FPNN Block
FP/B Joint Blocks
Experimental Results
Experimental setup
ResNet-18 and ResNet-20
Hardware cost and performance of NeuroBlend
...and 4 more sections

Figures (4)

Figure 1: Overview of MLPMixer model. (a) Overall MLPMixer and (b) mixing block architectures.
Figure 2: The proposed building blocks. The differences with respect to real-to-binary blocks are highlighted in red.
Figure 3: The proposed Mixing block in this paper.
Figure 4: (a) The proposed building block. (b) Distribute the BN from the precedent NeuroBlend block to both main and skip paths. (c) Merge components that result in the optimized NeuroBlend block. Note that the red and yellow dash boxes show the precedent and current building blocks. The green dash boxes show the potential components for merging.

NeuroBlend: Towards Low-Power yet Accurate Neural Network-Based Inference Engine Blending Binary and Fixed-Point Convolutions

TL;DR

Abstract

NeuroBlend: Towards Low-Power yet Accurate Neural Network-Based Inference Engine Blending Binary and Fixed-Point Convolutions

Authors

TL;DR

Abstract

Table of Contents

Figures (4)