Quadrilatero: A RISC-V programmable matrix coprocessor for low-power edge applications

Danilo Cammarata; Matteo Perotti; Marco Bertuletti; Angelo Garofalo; Pasquale Davide Schiavone; David Atienza; Luca Benini

Quadrilatero: A RISC-V programmable matrix coprocessor for low-power edge applications

Danilo Cammarata, Matteo Perotti, Marco Bertuletti, Angelo Garofalo, Pasquale Davide Schiavone, David Atienza, Luca Benini

TL;DR

Quadrilatero addresses the VRF bandwidth bottleneck of vector-based edge accelerators by introducing a RISC-V programmable matrix coprocessor with a dedicated matrix ISA and a systolic-array MAC engine. The design, implemented as a coprocessor for an RV32I core and evaluated in a $65$-nm process, achieves $A\approx 0.65\,\mathrm{mm}^2$ and up to $99.4\%$ FPU utilization for $64\times64$ matmuls, while delivering significant area and energy advantages over a state-of-the-art vector processor and a hybrid vector-matrix processor. The open-source work demonstrates strong area efficiency gains (up to $77\%$) and energy reductions (up to $15\%$) across multiple configurations, validating matrix ISA as a practical approach for high-arithmetic-density edge AI workloads.

Abstract

The rapid growth of AI-based Internet-of-Things applications increased the demand for high-performance edge processing engines on a low-power budget and tight area constraints. As a consequence, vector processor architectures, traditionally designed for high-performance computing (HPC), made their way into edge devices, promising high utilization of floating-point units (FPUs) and low power consumption. However, vector processors can only exploit a single dimension of parallelism, leading to expensive accesses to the vector register file (VRF) when performing matrix computations, which are pervasive in AI workloads. To overcome these limitations while guaranteeing programmability, many researchers and companies are developing dedicated instructions for a more efficient matrix multiplication (MatMul) execution. In this context, we propose Quadrilatero, an open-source RISC-V programmable systolic array coprocessor for low-power edge applications that implements a streamlined matrix ISA extension. We evaluate the post-synthesis power, performance, and area (PPA) metrics of Quadrilatero in a mature 65-nm technology node, showing that it requires only 0.65 mm^2 and that it can reach up to 99.4% of FPU utilization. Compared to a state-of-the-art open-source RISC-V vector processor and a hybrid vector-matrix processor optimized for embedded applications, Quadrilatero improves area efficiency and energy efficiency by up to 77% and 15%, respectively.

Quadrilatero: A RISC-V programmable matrix coprocessor for low-power edge applications

TL;DR

Abstract

Quadrilatero: A RISC-V programmable matrix coprocessor for low-power edge applications

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)