Multi-Dimensional Vector ISA Extension for Mobile In-Cache Computing

Alireza Khadem; Daichi Fujiki; Hilbert Chen; Yufeng Gu; Nishil Talati; Scott Mahlke; Reetuparna Das

Multi-Dimensional Vector ISA Extension for Mobile In-Cache Computing

Alireza Khadem, Daichi Fujiki, Hilbert Chen, Yufeng Gu, Nishil Talati, Scott Mahlke, Reetuparna Das

TL;DR

This work addresses the underutilization of mobile vector hardware by extending long-vector ISA design to support multi-dimensional data layouts and memory accesses within the cache. The proposed MVE framework abstracts cache geometry, enabling multi-dimensional strided and random accesses with dimension-level masking, and couples a compute-capable in-cache cache architecture to a scalar core. Empirical results show MVE delivers about 2.9× speedup and 8.8× energy reduction versus a baseline mobile vector unit, while imposing only 3.6% area overhead; it also outperforms 1D RVV and maintains favorable characteristics against a mobile GPU for fine-grained data-parallel workloads. Collectively, MVE demonstrates a practical pathway to significantly elevate mobile in-cache computing performance through a general-purpose, multi-dimensional long-vector ISA and closely integrated cache design.

Abstract

In-cache computing technology transforms existing caches into long-vector compute units and offers low-cost alternatives to building expensive vector engines for mobile CPUs. Unfortunately, existing long-vector Instruction Set Architecture (ISA) extensions, such as RISC-V Vector Extension (RVV) and Arm Scalable Vector Extension (SVE), provide only one-dimensional strided and random memory accesses. While this is sufficient for typical vector engines, it fails to effectively utilize the large Single Instruction, Multiple Data (SIMD) widths of in-cache vector engines. This is because mobile data-parallel kernels expose limited parallelism across a single dimension. Based on our analysis of mobile vector kernels, we introduce a long-vector Multi-dimensional Vector ISA Extension (MVE) for mobile in-cache computing. MVE achieves high SIMD resource utilization and enables flexible programming by abstracting cache geometry and data layout. The proposed ISA features multi-dimensional strided and random memory accesses and efficient dimension-level masked execution to encode parallelism across multiple dimensions. Using a wide range of data-parallel mobile workloads, we demonstrate that MVE offers significant performance and energy reduction benefits of 2.9x and 8.8x, on average, compared to the SIMD units of a commercial mobile processor, at an area overhead of 3.6%.

Multi-Dimensional Vector ISA Extension for Mobile In-Cache Computing

TL;DR

Abstract

Paper Structure (34 sections, 2 equations, 14 figures, 6 tables, 1 algorithm)

This paper contains 34 sections, 2 equations, 14 figures, 6 tables, 1 algorithm.

Introduction
Background
Long Vector Processing
In-Cache Computing
Instruction Set Architecture
Design Goals
Physical Register Abstraction.
Strided Memory Access with Possible Replication
Random Memory Access with Striding and Replication
Efficient Masked Execution
Programming Model
Compiler
Common Data-Parallel Patterns
Design
Core Micro-architecture
...and 19 more sections

Figures (14)

Figure 1: (a) Mobile core with in-cache computing enabled for half of the L2 cache. (b) In-SRAM computing activates two word-lines of an SRAM array using an extra row decoder. (c) Bit-Serial (blue), Bit-Hybrid, and Bit-Parallel (blue + orange) modifications to the bitline peripheral.
Figure 2: (a) MVE operates on N long-vector in-cache registers. (b) In-cache data elements and SIMD lanes use the vertical data layout of bit-lines. (c) An in-cache physical register spans all compute-capable SRAM arrays.
Figure 3: Strided memory access example of Intrapicture Prediction kernel: loading from (a) 2D memory layout to (b) 3D logical registers, mapped to (c) the SIMD lanes of flattened-out physical registers by MVE controller.
Figure 4: Random memory access of h2v2 Upsample kernel: loading from (a) random row pointers to (b) 4D logical registers. (c) shows the SIMD lanes of the flattened-out physical registers.
Figure 5: MVE Controller maps multi-dimensional logical registers to 1D Physical SIMD Registers. Efficient dimension-level masked execution masks off leaves under a node in the highest dimension of the tree (iterations of the outer-most loop).
...and 9 more figures

Multi-Dimensional Vector ISA Extension for Mobile In-Cache Computing

TL;DR

Abstract

Multi-Dimensional Vector ISA Extension for Mobile In-Cache Computing

Authors

TL;DR

Abstract

Table of Contents

Figures (14)