TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios
Yichao Zhang, Marco Bertuletti, Samuel Riedel, Matheus Cavalcante, Alessandro Vanelli-Coralli, Luca Benini
TL;DR
The paper tackles the demand for high-throughput, power-efficient SDR processing in next-generation RANs. It introduces TeraPool-SDR, a physically-aware, three-level hierarchical many-core cluster of $1024$ Snitch RV32 cores sharing a $4\,\mathrm{MiB}$ L1 with $4096$ banks, implemented in GF\!12nm LP+. The authors detail the Tile/SubGroup/Group architecture, the interconnect design, and a full physical implementation, including latency-throughput trade-offs and area/power analyses, with open-source emphasis. Across FFT, MatMul, Channel Estimation, and Linear System Inversion kernels, the design achieves high energy efficiency (GOPS/W) and practical power consumption (<10 W) while reaching up to $1.89\mathrm{TOPS}$ at GHz frequencies, suggesting a viable path for open, programmable PHY stacks in 5G/6G basestations and potential 3D-IC scaling.
Abstract
Radio Access Networks (RAN) workloads are rapidly scaling up in data processing intensity and throughput as the 5G (and beyond) standards grow in number of antennas and sub-carriers. Offering flexible Processing Elements (PEs), efficient memory access, and a productive parallel programming model, many-core clusters are a well-matched architecture for next-generation software-defined RANs, but staggering performance requirements demand a high number of PEs coupled with extreme Power, Performance and Area (PPA) efficiency. We present the architecture, design, and full physical implementation of Terapool-SDR, a cluster for Software Defined Radio (SDR) with 1024 latency-tolerant, compact RV32 PEs, sharing a global view of a 4MiB, 4096-banked, L1 memory. We report various feasible configurations of TeraPool-SDR featuring an ultra-high bandwidth PE-to-L1-memory interconnect, clocked at 730MHz, 880MHz, and 924MHz (TT/0.80 V/25 °C) in 12nm FinFET technology. The TeraPool-SDR cluster achieves high energy efficiency on all SDR key kernels for 5G RANs: Fast Fourier Transform (93GOPS/W), Matrix-Multiplication (125GOPS/W), Channel Estimation (96GOPS/W), and Linear System Inversion (61GOPS/W). For all the kernels, it consumes less than 10W, in compliance with industry standards.
