Table of Contents
Fetching ...

Efficient Architecture for RISC-V Vector Memory Access

Hongyi Guan, Yichuan Gao, Chenlu Miao, Haoyang Wu, Hang Zhu, Mingfeng Lin, Huayue Liang

TL;DR

EARTH introduces shifting-based data reorganization to optimize RISC-V vector memory access, addressing inefficiencies in strided and segment patterns. The design centers on three innovations: DROM with shift networks (GSN/SSN) for data movement, LSDO for coalescing strided accesses, and the RCVRF for buffer-free segment handling. Implemented in Chisel on Saturn, EARTH delivers 4x–8x speedups for stride-heavy workloads with 9% area and 41% power reductions, while maintaining parity on segment workloads. This work demonstrates a practical, scalable approach to improving vector memory performance with reduced hardware overhead, enabling more efficient vector CPUs and potential multi-LSU extensions.

Abstract

Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns. While coalescing strided accesses is a natural solution, effectively gathering or scattering elements at fixed strides remains challenging. Naive approaches rely on high-overhead crossbars that remap any byte between memory and registers, leading to physical design issues. Segment operations require row-column transpositions, typically handled using either element-level in-place transposition (degrading performance) or large buffer-based bulk transposition (incurring high area overhead). In this paper, we present EARTH, a novel vector memory access architecture designed to overcome these challenges through shifting-based optimizations. For strided accesses, EARTH integrates specialized shift networks for gathering and scattering elements. After coalescing multiple accesses within the same cache line, data is routed between memory and registers through the shifting network with minimal overhead. For segment operations, EARTH employs a shifted register bank enabling direct column-wise access, eliminating dedicated segment buffers while providing high-performance, in-place bulk transposition. Implemented on FPGA with Chisel HDL based on an open-source RISC-V vector unit, EARTH enhances performance for strided memory accesses, achieving 4x-8x speedups in benchmarks dominated by strided operations. Compared to conventional designs, EARTH reduces hardware area by 9% and power consumption by 41%, significantly advancing both performance and efficiency of vector processors.

Efficient Architecture for RISC-V Vector Memory Access

TL;DR

EARTH introduces shifting-based data reorganization to optimize RISC-V vector memory access, addressing inefficiencies in strided and segment patterns. The design centers on three innovations: DROM with shift networks (GSN/SSN) for data movement, LSDO for coalescing strided accesses, and the RCVRF for buffer-free segment handling. Implemented in Chisel on Saturn, EARTH delivers 4x–8x speedups for stride-heavy workloads with 9% area and 41% power reductions, while maintaining parity on segment workloads. This work demonstrates a practical, scalable approach to improving vector memory performance with reduced hardware overhead, enabling more efficient vector CPUs and potential multi-LSU extensions.

Abstract

Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns. While coalescing strided accesses is a natural solution, effectively gathering or scattering elements at fixed strides remains challenging. Naive approaches rely on high-overhead crossbars that remap any byte between memory and registers, leading to physical design issues. Segment operations require row-column transpositions, typically handled using either element-level in-place transposition (degrading performance) or large buffer-based bulk transposition (incurring high area overhead). In this paper, we present EARTH, a novel vector memory access architecture designed to overcome these challenges through shifting-based optimizations. For strided accesses, EARTH integrates specialized shift networks for gathering and scattering elements. After coalescing multiple accesses within the same cache line, data is routed between memory and registers through the shifting network with minimal overhead. For segment operations, EARTH employs a shifted register bank enabling direct column-wise access, eliminating dedicated segment buffers while providing high-performance, in-place bulk transposition. Implemented on FPGA with Chisel HDL based on an open-source RISC-V vector unit, EARTH enhances performance for strided memory accesses, achieving 4x-8x speedups in benchmarks dominated by strided operations. Compared to conventional designs, EARTH reduces hardware area by 9% and power consumption by 41%, significantly advancing both performance and efficiency of vector processors.

Paper Structure

This paper contains 40 sections, 4 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: RVV Memory Access Patterns
  • Figure 2: Crossbar Network for Byte-Level Remapping in Naive Strided Access Coalescing
  • Figure 3: Segment Buffer
  • Figure 4: Timeline of methods to support segment intructions
  • Figure 5: EARTH Architecture Overview
  • ...and 10 more figures