Table of Contents
Fetching ...

Gaze into the Pattern: Characterizing Spatial Patterns with Internal Temporal Correlations for Hardware Prefetching

Zixiao Chen, Chentao Wu, Yunfei Gu, Ranhao Jia, Jie Li, Minyi Guo

TL;DR

Gaze rethinks spatial prefetching by exploiting footprint-internal temporal correlations within memory footprints, rather than relying on environmental context alone. It introduces a lightweight hardware prefetcher with a Pattern History Module, using only the first two footprint accesses to characterize spatial patterns and a two-stage streaming strategy to manage aggressive prefetching in dense footprints. Across 201 traces from SPEC, PARSEC, Ligra, CloudSuite, and GAP/QMM, Gaze achieves significant IPC improvements while maintaining high accuracy and modest hardware overhead, outperforming contemporary baselines such as PMP and vBerti in most scenarios. The work demonstrates practical gains for both single- and multi-core systems and offers a scalable approach to robust, low-latency memory access for diverse workloads.

Abstract

Hardware prefetching is one of the most widely-used techniques for hiding long data access latency. To address the challenges faced by hardware prefetching, architects have proposed to detect and exploit the spatial locality at the granularity of spatial region. When a new region is activated, they try to find similar previously accessed regions for footprint prediction based on system-level environmental features such as the trigger instruction or data address. However, we find that such context-based prediction cannot capture the essential characteristics of access patterns, leading to limited flexibility, practicality and suboptimal prefetching performance. In this paper, inspired by the temporal property of memory accessing, we note that the temporal correlation exhibited within the spatial footprint is a key feature of spatial patterns. To this end, we propose Gaze, a simple and efficient hardware spatial prefetcher that skillfully utilizes footprint-internal temporal correlations to efficiently characterize spatial patterns. Meanwhile, we observe a unique unresolved challenge in utilizing spatial footprints generated by spatial streaming, which exhibit extremely high access density. Therefore, we further enhance Gaze with a dedicated two-stage approach that mitigates the over-prefetching problem commonly encountered in conventional schemes. Our comprehensive and diverse set of experiments show that Gaze can effectively enhance the performance across a wider range of scenarios. Specifically, Gaze improves performance by 5.7\% and 5.4\% at single-core, 11.4\% and 8.8\% at eight-core, compared to most recent low-cost solutions PMP and vBerti.

Gaze into the Pattern: Characterizing Spatial Patterns with Internal Temporal Correlations for Hardware Prefetching

TL;DR

Gaze rethinks spatial prefetching by exploiting footprint-internal temporal correlations within memory footprints, rather than relying on environmental context alone. It introduces a lightweight hardware prefetcher with a Pattern History Module, using only the first two footprint accesses to characterize spatial patterns and a two-stage streaming strategy to manage aggressive prefetching in dense footprints. Across 201 traces from SPEC, PARSEC, Ligra, CloudSuite, and GAP/QMM, Gaze achieves significant IPC improvements while maintaining high accuracy and modest hardware overhead, outperforming contemporary baselines such as PMP and vBerti in most scenarios. The work demonstrates practical gains for both single- and multi-core systems and offers a scalable approach to robust, low-latency memory access for diverse workloads.

Abstract

Hardware prefetching is one of the most widely-used techniques for hiding long data access latency. To address the challenges faced by hardware prefetching, architects have proposed to detect and exploit the spatial locality at the granularity of spatial region. When a new region is activated, they try to find similar previously accessed regions for footprint prediction based on system-level environmental features such as the trigger instruction or data address. However, we find that such context-based prediction cannot capture the essential characteristics of access patterns, leading to limited flexibility, practicality and suboptimal prefetching performance. In this paper, inspired by the temporal property of memory accessing, we note that the temporal correlation exhibited within the spatial footprint is a key feature of spatial patterns. To this end, we propose Gaze, a simple and efficient hardware spatial prefetcher that skillfully utilizes footprint-internal temporal correlations to efficiently characterize spatial patterns. Meanwhile, we observe a unique unresolved challenge in utilizing spatial footprints generated by spatial streaming, which exhibit extremely high access density. Therefore, we further enhance Gaze with a dedicated two-stage approach that mitigates the over-prefetching problem commonly encountered in conventional schemes. Our comprehensive and diverse set of experiments show that Gaze can effectively enhance the performance across a wider range of scenarios. Specifically, Gaze improves performance by 5.7\% and 5.4\% at single-core, 11.4\% and 8.8\% at eight-core, compared to most recent low-cost solutions PMP and vBerti.

Paper Structure

This paper contains 38 sections, 18 figures, 6 tables.

Figures (18)

  • Figure 1: Speedup achieved by different context-based characterization schemes and their hardware overheads. Suffix -opt means an optimized version from recent literature.
  • Figure 2: The detailed reference footprints of several spatial regions. These regions are accessed close in time. The trigger accesses are cycled in red. The footprint of region B is partially given.
  • Figure 3: Design overview of Gaze (b) and its detailed learning (a) and prefetching (c) process. For simplicity, we use trigger and second to refer to the trigger offset and the second offset, omitting the term offset. H and M mean a hit and a miss in the corresponding structure, respectively.
  • Figure 4: Effect of extending the number of aligned initial accesses that required for a match. The region size is set to 4KB.
  • Figure 5: Pseudocode of BFS-based graph processing and an example of its illustration.
  • ...and 13 more figures