Table of Contents
Fetching ...

The Bicameral Cache: a split cache for vector architectures

Susana Rebolledo, Borja Perez, Jose Luis Bosque, Peter Hsu

TL;DR

The paper addresses memory performance in vector architectures by separating scalar and vector data into two exclusive caches, the Scalar Cache and Vector Cache, to preserve their distinct locality patterns. It introduces sectorized cache lines, an exclusive cross-cache policy with vector data migration, an embedded write-back strategy, and a memory-side prefetching mechanism that exploits row-buffer locality. Evaluated on the Cavatools RVV simulator, the Bicameral Cache achieves notable stride-1 gains (up to 1.57x with prefetch) and up to 11% improvements for non-stride-1 workloads, without increasing hardware complexity. The approach highlights the practical impact of tailoring memory hierarchies to vector workloads and recommends prefetching as a core optimization for vector performance.

Abstract

The Bicameral Cache is a cache organization proposal for a vector architecture that segregates data according to their access type, distinguishing scalar from vector references. Its aim is to avoid both types of references from interfering in each other's data locality, with a special focus on prioritizing the performance on vector references. The proposed system incorporates an additional, non-polluting prefetching mechanism to help populate the long vector cache lines in advance to increase the hit rate by further exploiting the spatial locality on vector data. Its evaluation was conducted on the Cavatools simulator, comparing the performance to a standard conventional cache, over different typical vector benchmarks for several vector lengths. The results proved the proposed cache speeds up performance on stride-1 vector benchmarks, while hardly impacting non-stride-1's. In addition, the prefetching feature consistently provided an additional value.

The Bicameral Cache: a split cache for vector architectures

TL;DR

The paper addresses memory performance in vector architectures by separating scalar and vector data into two exclusive caches, the Scalar Cache and Vector Cache, to preserve their distinct locality patterns. It introduces sectorized cache lines, an exclusive cross-cache policy with vector data migration, an embedded write-back strategy, and a memory-side prefetching mechanism that exploits row-buffer locality. Evaluated on the Cavatools RVV simulator, the Bicameral Cache achieves notable stride-1 gains (up to 1.57x with prefetch) and up to 11% improvements for non-stride-1 workloads, without increasing hardware complexity. The approach highlights the practical impact of tailoring memory hierarchies to vector workloads and recommends prefetching as a core optimization for vector performance.

Abstract

The Bicameral Cache is a cache organization proposal for a vector architecture that segregates data according to their access type, distinguishing scalar from vector references. Its aim is to avoid both types of references from interfering in each other's data locality, with a special focus on prioritizing the performance on vector references. The proposed system incorporates an additional, non-polluting prefetching mechanism to help populate the long vector cache lines in advance to increase the hit rate by further exploiting the spatial locality on vector data. Its evaluation was conducted on the Cavatools simulator, comparing the performance to a standard conventional cache, over different typical vector benchmarks for several vector lengths. The results proved the proposed cache speeds up performance on stride-1 vector benchmarks, while hardly impacting non-stride-1's. In addition, the prefetching feature consistently provided an additional value.
Paper Structure (15 sections, 5 figures, 3 tables)

This paper contains 15 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Structural representation of lines and a sectors of the Bicameral Cache.
  • Figure 2: Structural representation of the Bicameral Cache.
  • Figure 3: Speedup evaluation. BC: Bicameral Cache, W/O: without prefetching, PF: with prefetching, IDL: with ideal prefetching.
  • Figure 4: Average Memory Access Time. BC: Bicameral Cache, W/O: without prefetching, PF: with prefetching, IDL: with ideal prefetching.
  • Figure 5: Detailed analysis. BC: Bicameral Cache, W/O: without prefetching, PF: with prefetching, IDL: with ideal prefetching.