FERMI-ML: A Flexible and Resource-Efficient Memory-In-Situ SRAM Macro for TinyML acceleration

Mukul Lokhande; Akash Sankhe; S. V. Jaya Chand; Santosh Kumar Vishvakarma

FERMI-ML: A Flexible and Resource-Efficient Memory-In-Situ SRAM Macro for TinyML acceleration

Mukul Lokhande, Akash Sankhe, S. V. Jaya Chand, Santosh Kumar Vishvakarma

TL;DR

FERMI-ML addresses the energy and bandwidth costs of TinyML on AIoT devices by delivering a flexible Memory-In-Situ SRAM macro that performs computation inside the memory array. The approach combines a 9T XNOR-based RX9T bit-cell with a 22T 4:2 compressor to enable variable-precision MAC and CAM inside a 4 KB macro, supporting Normal, CAM, and PIM modes with Posit-4/FP-4 precision. Post-layout results at 65 nm show 350 MHz operation at 0.9 V, achieving 1.93 TOPS and 364 TOPS/W, with QoR exceeding 97.5% on InceptionV4 and ResNet-18. The work demonstrates a compact, reconfigurable MIS macro capable of mixed-precision TinyML workloads and LUT-based non-linear activations, with potential integration as an L3 cache in RISC-V edge AI SoCs.

Abstract

The growing demand for low-power and area-efficient TinyML inference on AIoT devices necessitates memory architectures that minimise data movement while sustaining high computational efficiency. This paper presents FERMI-ML, a Flexible and Resource-Efficient Memory-In-Situ (MIS) SRAM macro designed for TinyML acceleration. The proposed 9T XNOR-based RX9T bit-cell integrates a 5T storage cell with a 4T XNOR compute unit, enabling variable-precision MAC and CAM operations within the same array. A 22-transistor (C22T) compressor-tree-based accumulator facilitates logarithmic 1-64-bit MAC computation with reduced delay and power compared to conventional adder trees. The 4 KB macro achieves dual functionality for in-situ computation and CAM-based lookup operations, supporting Posit-4 or FP-4 precision. Post-layout results at 65 nm show operation at 350 MHz with 0.9 V, delivering a throughput of 1.93 TOPS and an energy efficiency of 364 TOPS/W, while maintaining a Quality-of-Result (QoR) above 97.5% with InceptionV4 and ResNet-18. FERMI-ML thus demonstrates a compact, reconfigurable, and energy-aware digital Memory-In-Situ macro capable of supporting mixed-precision TinyML workloads.

FERMI-ML: A Flexible and Resource-Efficient Memory-In-Situ SRAM Macro for TinyML acceleration

TL;DR

Abstract

FERMI-ML: A Flexible and Resource-Efficient Memory-In-Situ SRAM Macro for TinyML acceleration

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)