A 1024 RV-Cores Shared-L1 Cluster with High Bandwidth Memory Link for Low-Latency 6G-SDR
Yichao Zhang, Marco Bertuletti, Chi Zhang, Samuel Riedel, Alessandro Vanelli-Coralli, Luca Benini
TL;DR
The paper addresses the escalating baseband compute demands for 6G SDR by proposing TeraPool-SDR, a 1024-core RISC-V cluster that shares 4 MiB of L1 memory and connects to high-bandwidth memory via a modular DMA-backed hierarchy. It introduces a three-level crossbar interconnect and a split DMA pipeline (frontend/midend/backend) validated with cycle-accurate DRAMsys simulations to model ultra-high data flows. Across key SDR kernels (FFT, beamforming, channel estimation, linear inversion), the system achieves data-movement overhead below 9% with 910 GBps bandwidth at 98% efficiency, IPC above 0.6, sub-ms latency, and power under 8.8 W. The work demonstrates an open-source, scalable path toward low-latency, energy-efficient 6G baseband accelerators, with design tradeoffs between latency-tolerant configurations and performance benchmarks.
Abstract
We introduce an open-source architecture for next-generation Radio-Access Network baseband processing: 1024 latency-tolerant 32-bit RISC-V cores share 4 MiB of L1 memory via an ultra-low latency interconnect (7-11 cycles), a modular Direct Memory Access engine provides an efficient link to a high bandwidth memory, such as HBM2E (98% peak bandwidth at 910GBps). The system achieves leading-edge energy efficiency at sub-ms latency in key 6G baseband processing kernels: Fast Fourier Transform (93 GOPS/W), Beamforming (125 GOPS/W), Channel Estimation (96 GOPS/W), and Linear System Inversion (61 GOPS/W), with only 9% data movement overhead.
