Siracusa: A 16 nm Heterogenous RISC-V SoC for Extended Reality with At-MRAM Neural Engine
Arpan Suravi Prasad, Moritz Scherer, Francesco Conti, Davide Rossi, Alfio Di Mauro, Manuel Eggimann, Jorge Tómas Gómez, Ziyun Li, Syed Shakib Sarwar, Zhao Wang, Barbara De Salvo, Luca Benini
TL;DR
Siracusa presents a 16 nm near-sensor heterogeneous SoC that tightly integrates an all-digital N-EUREKA neural engine with high-density MRAM weight memory. The At-MRAM approach doubles weight-transfer bandwidth and enables all-weights-on-chip inference, delivering up to 1.95 TOps and 8.84 TOpJ under realistic XR workloads. Core contributions include a dual-memory neural subsystem (MRAM weights and SRAM tiles), software-assisted virtual memory paging, and a tile-activation memory that together dramatically reduce end-to-end latency (by up to 1.7x) and energy (by up to 3x) versus conventional L3-based schemes. The results demonstrate state-of-the-art area efficiency (65.2 GOp/s/mm^2) and end-to-end performance (698 GOps throughput at 8-bit quantization) with practical implications for XR devices and wearable deployments.
Abstract
Extended reality (XR) applications are Machine Learning (ML)-intensive, featuring deep neural networks (DNNs) with millions of weights, tightly latency-bound (10-20 ms end-to-end), and power-constrained (low tens of mW average power). While ML performance and efficiency can be achieved by introducing neural engines within low-power systems-on-chip (SoCs), system-level power for nontrivial DNNs depends strongly on the energy of non-volatile memory (NVM) access for network weights. This work introduces Siracusa, a near-sensor heterogeneous SoC for next-generation XR devices manufactured in 16 nm CMOS. Siracusa couples an octa-core cluster of RISC-V digital signal processing cores with a novel tightly-coupled "At-Memory" integration between a state-of-the-art digital neural engine called N-EUREKA and an on-chip NVM based on magnetoresistive memory(MRAM), achieving 1.7x higher throughput and 3x better energy efficiency than XR SoCs using NVM as background memory. The fabricated SoC prototype achieves an area efficiency of 65.2 GOp/s/mm2 and a peak energy efficiency of 8.84 TOp/J for DNN inference while supporting complex heterogeneous application workloads, which combine ML with conventional signal processing and control.
