Instruction Scheduling in the Saturn Vector Unit
Jerry Zhao, Daniel Grubb, Miles Rusch, Tianrui Wei, Kevin Anderson, Borivoje Nikolic, Krste Asanovic
TL;DR
Saturn addresses the inefficiency of long-vector designs in mobile and edge contexts by delivering a full RVV 1.0-compliant short-vector vector unit with fine-grained vector chaining and decoupled memory paths. The core methodology combines explicit per-element-group hazard tracking, a compact backend with limited OoO sequencing, and a decoupled load-store path to enable run-ahead memory and high datapath utilization for short vectors. Key contributions include the Saturn RTL (Chisel) implementation, comprehensive area/power/performance evaluation, and a detailed analysis of design parameters such as chime length, issue queue depth, and memory latency. The results show Saturn achieving competitive power and area while delivering high utilization across diverse workloads, illustrating the practicality of compact, scalable vector units for mobile and embedded applications.
Abstract
While the challenges and solutions for efficient execution of scalable vector ISAs on long-vector-length microarchitectures have been well established, not all of these solutions are suitable for short-vector-length implementations. This work proposes a novel microarchitecture for instruction sequencing in vector units with short architectural vector lengths. The proposed microarchitecture supports fine-granularity chaining, multi-issue out-of-order execution, zero dead-time, and run-ahead memory accesses with low area or complexity costs. We present the Saturn Vector Unit, a RTL implementation of a RVV vector unit. With our instruction scheduling mechanism, Saturn exhibits comparable or superior power, performance, and area characteristics compared to state-of-the-art long-vector and short-vector implementations.
