MemPool Flavors: Between Versatility and Specialization in a RISC-V Manycore Cluster
Sergio Mazzola, Yichao Zhang, Marco Bertuletti, Diyou Shen, Luca Benini
TL;DR
MemPool tackles the challenge of delivering memory-intensive, parallel workloads within a programmable, low-power manycore cluster by providing a scalable, open-source RISC-V platform with shared-L1 memory and a hierarchical interconnect. It offers three flavors—Baseline MemPool, Systolic MemPool, and Vectorial MemPool—covering a wide trade-off space between versatility and specialization. Using a common architectural proxy based on a $32$-bit FP matmul kernel, the paper characterizes each flavor: Baseline achieves about $59\%$ utilization, Systolic gains roughly $7\%$ matmul performance with a $5\%$ area overhead by reducing loads/stores, and Vectorial reaches up to $94\%$ utilization on vectorizable workloads at the cost of larger tile area; all flavors operate at $800\mathrm{MHz}$ and deliver up to $204.8\mathrm{GFLOP/s}$. The results illustrate a broad spectrum of design choices enabled by MemPool, validating its role as an adaptable research platform for exploring the balance between general-purpose programmability and hardware specialization.
Abstract
As computational paradigms evolve, applications such as attention-based models, wireless telecommunications, and computer vision impose increasingly challenging requirements on computer architectures: significant memory footprints and computing resources are demanded while maintaining flexibility and programmability at a low power budget. Thanks to their advantageous trade-offs, shared-L1-memory clusters have become a common building block of massively parallel computing architectures tackling these issues. MemPool is an open-source, RISC-V-based manycore cluster scaling up to 1024 processing elements (PEs). MemPool offers a scalable, extensible, and programmable solution to the challenges of shared-L1 clusters, establishing itself as an open-source research platform for architectural variants covering a wide trade-off space between versatility and performance. As a demonstration, this paper compares the three main MemPool flavors, Baseline MemPool, Systolic MemPool, and Vectorial MemPool, detailing their architecture, targets, and achieved trade-offs.
