NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator

Mohit Upadhyay; Rohan Juneja; Weng-Fai Wong; Li-Shiuan Peh

NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator

Mohit Upadhyay, Rohan Juneja, Weng-Fai Wong, Li-Shiuan Peh

TL;DR

NOVA introduces a NoC-based vector unit that performs on-chip non-linear approximation for attention layers by broadcasting slope and bias values across a NoC, enabling efficient overlay on existing edge accelerators. By replacing per-PE LUT storage with a line NoC broadcast, NOVA achieves substantial area, power, and energy savings while maintaining 1-cycle latency for common breakpoints. The approach is demonstrated across REACT, TPU v3/v4, and NVDLA architectures, showing up to $37.8\times$ power savings and several-fold area benefits over traditional LUT-based approximators, with notable energy improvements for BERT-like workloads. The work highlights the practical potential of using NoC wires for non-linear function approximation in attention-heavy models at the edge, enabling transformer-style workloads on compact accelerators with minimal hardware overhead.

Abstract

Attention mechanisms are becoming increasingly popular, being used in neural network models in multiple domains such as natural language processing (NLP) and vision applications, especially at the edge. However, attention layers are difficult to map onto existing neuro accelerators since they have a much higher density of non-linear operations, which lead to inefficient utilization of today's vector units. This work introduces NOVA, a NoC-based Vector Unit that can perform non-linear operations within the NoC of the accelerators, and can be overlaid onto existing neuro accelerators to map attention layers at the edge. Our results show that the NOVA architecture is up to 37.8x more power-efficient than state-of-the-art hardware approximators when running existing attention-based neural networks.

NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator

TL;DR

power savings and several-fold area benefits over traditional LUT-based approximators, with notable energy improvements for BERT-like workloads. The work highlights the practical potential of using NoC wires for non-linear function approximation in attention-heavy models at the edge, enabling transformer-style workloads on compact accelerators with minimal hardware overhead.

Abstract

Paper Structure (27 sections, 8 figures, 4 tables)

This paper contains 27 sections, 8 figures, 4 tables.

Introduction
Background and Motivation
NOVA Hardware Architecture
NoC-based approximator hardware architecture
NOVA router micro-architecture (Fig \ref{['fig:nl-router-microarch']})
NOVA NoC routing
Integrating NOVA NoC with third-party accelerators
Integrating NOVA NoC with REACT (Fig \ref{['fig:nl-router-accelerator']}(a))
Integrating NOVA NoC with a Systolic Array Architecture (Fig \ref{['fig:nl-router-accelerator']}(b))
Integrating NOVA NoC with NVDLA (Fig \ref{['fig:nl-router-accelerator']}(c))
Mapping of Non-linear Operations on NoC Accelerators with NOVA
Evaluation
Experimental methodology
Synthesis results: Baseline LUT-based Vector Unit vs. NOVA
Synthesis Results: REACT integrated with NOVA
...and 12 more sections

Figures (8)

Figure 1: LUT-based approximator (shared by 256 neurons)
Figure 2: Walkthrough of approximation with LUT-based baseline
Figure 3: Architecture of NOVA router with the comparator and MAC. Each router has two input and output links, connected in a 1D line topology. Each input and output link is 257 bits wide, encompassing 16 words (8 pairs of slope and bias values) along with their corresponding tag bit.
Figure 4: Walkthrough of approximation using NOVA NoC
Figure 5: Integrating NOVA with (a) REACT WS routers, (b) TPU v3/v4 MXU, (c) NVDLA Convolution Core
...and 3 more figures

NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator

TL;DR

Abstract

NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator

Authors

TL;DR

Abstract

Table of Contents

Figures (8)