Table of Contents
Fetching ...

Improved Prefetching Techniques for Linked Data Structures

Nikola Vuk Maruszewski

TL;DR

Linked data structures (LDSs) challenge traditional memory-prefetching due to scattered nodes and irregular access patterns. The authors propose Linkey, a hybrid hardware-software prefetcher that uses lightweight software-provided metadata to configure an Address Table (AT), a Child Association Table (CAT), and a Backup Fetch Queue (BFQ), enabling timely, accurate prefetches without speculative pointer detection. Across traversal and lookup benchmarks, Linkey delivers a geometric mean miss-rate reduction of $13 ext{%}$ (up to $58.8 ext{%}$) and a geometric mean accuracy improvement of $65.4 ext{%}$, with IPC gains up to $12.1 ext{%}$ on applicable workloads. This work demonstrates how modest programmer/compiler hints, coupled with hardware tables and a flexible fetch pipeline, can substantially reduce memory stalls for pointer-chasing patterns, offering a practical path toward improved performance in LDS-heavy applications.

Abstract

With ever-increasing main memory stall times, we need novel techniques to reduce effective memory access latencies. Prefetching has been shown to be an effective solution, especially with contiguous data structures that follow the traditional principles of spatial and temporal locality. However, on linked data structures$-$made up of many nodes linked together with pointers$-$typical prefetchers struggle, failing to predict accesses as elements are arbitrarily scattered throughout memory and access patters are arbitrarily complex and hence difficult to predict. To remedy these issues, we introduce $\textit{Linkey}$, a novel prefetcher that utilizes hints from the programmer/compiler to cache layout information and accurately prefetch linked data structures. $\textit{Linkey}$ obtains substantial performance improvements over a striding baseline. We achieve a geomean 13% reduction in miss rate with a maximum improvement of 58.8%, and a 65.4% geomean increase in accuracy, with many benchmarks improving from 0%. On benchmarks where $\textit{Linkey}$ is applicable, we observe a geomean IPC improvement of 1.40%, up to 12.1%.

Improved Prefetching Techniques for Linked Data Structures

TL;DR

Linked data structures (LDSs) challenge traditional memory-prefetching due to scattered nodes and irregular access patterns. The authors propose Linkey, a hybrid hardware-software prefetcher that uses lightweight software-provided metadata to configure an Address Table (AT), a Child Association Table (CAT), and a Backup Fetch Queue (BFQ), enabling timely, accurate prefetches without speculative pointer detection. Across traversal and lookup benchmarks, Linkey delivers a geometric mean miss-rate reduction of (up to ) and a geometric mean accuracy improvement of , with IPC gains up to on applicable workloads. This work demonstrates how modest programmer/compiler hints, coupled with hardware tables and a flexible fetch pipeline, can substantially reduce memory stalls for pointer-chasing patterns, offering a practical path toward improved performance in LDS-heavy applications.

Abstract

With ever-increasing main memory stall times, we need novel techniques to reduce effective memory access latencies. Prefetching has been shown to be an effective solution, especially with contiguous data structures that follow the traditional principles of spatial and temporal locality. However, on linked data structuresmade up of many nodes linked together with pointerstypical prefetchers struggle, failing to predict accesses as elements are arbitrarily scattered throughout memory and access patters are arbitrarily complex and hence difficult to predict. To remedy these issues, we introduce , a novel prefetcher that utilizes hints from the programmer/compiler to cache layout information and accurately prefetch linked data structures. obtains substantial performance improvements over a striding baseline. We achieve a geomean 13% reduction in miss rate with a maximum improvement of 58.8%, and a 65.4% geomean increase in accuracy, with many benchmarks improving from 0%. On benchmarks where is applicable, we observe a geomean IPC improvement of 1.40%, up to 12.1%.

Paper Structure

This paper contains 46 sections, 11 figures, 6 tables, 3 algorithms.

Figures (11)

  • Figure 2.1: Common types of linked data structures. Circles and rectangles represent nodes, the arrows represent pointers (i.e., the links) between nodes in the linked data structure.
  • Figure 3.2: Populated example of the Address and Child Association Tables for a system with 48-bit virtual addresses. The AT holds 64 entries and supports up to two CAT pointers. The CAT holds 512 elements. Note that entries are displayed vertically. The numbers on the bottom of the table represent entry indexes. Colors are used to highlight specific entries.
  • Figure 3.3: The Linkey fetch pipeline. This sits on the critical path and runs in parallel to the core's L1-D$ access.
  • Figure 4.4: A histogram of a Zipfian distribution with parameter $\theta=0.99$ on $[1, 100]$. The distribution was sampled 1000000.0 times.
  • Figure 4.5: Normalized numbers of load misses with different prefetcher configurations. Rows represent benchmark sizes, colors represent configurations.
  • ...and 6 more figures