Spatzformer: An Efficient Reconfigurable Dual-Core RISC-V V Cluster for Mixed Scalar-Vector Workloads
Matteo Perotti, Michele Raeber, Mattia Sinigaglia, Matheus Cavalcante, Davide Rossi, Luca Benini
TL;DR
Spatzformer tackles underutilization in multi-core vector architectures by introducing a reconfigurable RVV-based dual-core cluster with split and merge modes to handle heterogeneous scalar and vector workloads. It extends a Spatz-based baseline and is implemented in 12-nm, with an area overhead of about 1.4% and no frequency degradation. Empirical results across six kernels show merge mode delivering up to 1.8× speedup for mixed scalar-vector workloads and up to 20% FFT improvement, with average energy efficiency losses around 5%. The work demonstrates a practical design point for balancing vector throughput and control-task latency in resource-constrained environments, enabling better resource utilization in real-world mixed workloads.
Abstract
Multi-core vector processor architectures excel in handling computationally intensive vectorizable tasks but struggle to achieve optimal resource utilization when facing sequential and control tasks that cannot be vectorized. This work presents Spatzformer, the first reconfigurable RISC-V V (RVV) architecture developed from a baseline open-source dual-core cluster based on Snitch scalar cores augmented with compact Spatz vector units. Spatzformer operates in two distinct modes: split mode, working as a dual-core vector architecture to handle vectorizable tasks concurrently, and merge mode, where two vector units are driven by a single scalar core, allowing the remaining scalar core to handle non-vectorizable control tasks. We implement Spatzformer in a 12-nm technology node and characterize the cost of the added architectural reconfigurability. We show that merge mode accelerates mixed scalar-vector kernels by up to 1.8x compared to split mode. Moreover, it accelerates the vector kernels that require fine-grained synchronization (such as FFT) by up to 20% with respect to the baseline. The reconfigurability features do not degrade the architecture's maximum frequency (1.2GHz, TT, 0.8V, 25C) and have a negligible area impact (+1.4%), with a worst-case energy efficiency drop of only 7% with respect to the non-reconfigurable baseline.
