Table of Contents
Fetching ...

Learning Semantics, Not Addresses: Runtime Neural Prefetching for Far Memory

Yutong Huang, Zhiyuan Guo, Yiying Zhang

TL;DR

FarSight addresses the challenge of prefetching for far-memory systems by decoupling application semantics from runtime memory addresses and training a compact neural predictor offline. It represents memory access outcomes as ordinals in a small vocabulary and resolves them at runtime with per-page future maps, enabling low-latency, look-ahead prefetching. Implemented in the Linux kernel with a per-core RetNet-based predictor, FarSight achieves sub-microsecond inferences and demonstrates up to $3.6\times$ speedups over state-of-the-art baselines across four data-intensive workloads. The work shows the practicality of DL-based prefetching in far-memory systems and introduces design choices like rotary positional encoding and offline input generalization to make the approach robust and scalable.

Abstract

Memory prefetching has long boosted CPU caches and is increasingly vital for far-memory systems, where large portions of memory are offloaded to cheaper, remote tiers. While effective prefetching requires accurate prediction of future accesses, prior ML approaches have been limited to simulation or small-scale hardware. We introduce FarSight, the first Linux-based far-memory system to leverage deep learning by decoupling application semantics from runtime memory layout. This separation enables offline-trained models to predict access patterns over a compact ordinal vocabulary, which are resolved at runtime through lightweight mappings. Across four data-intensive workloads, FarSight delivers up to 3.6x higher performance than the state-of-the-art.

Learning Semantics, Not Addresses: Runtime Neural Prefetching for Far Memory

TL;DR

FarSight addresses the challenge of prefetching for far-memory systems by decoupling application semantics from runtime memory addresses and training a compact neural predictor offline. It represents memory access outcomes as ordinals in a small vocabulary and resolves them at runtime with per-page future maps, enabling low-latency, look-ahead prefetching. Implemented in the Linux kernel with a per-core RetNet-based predictor, FarSight achieves sub-microsecond inferences and demonstrates up to speedups over state-of-the-art baselines across four data-intensive workloads. The work shows the practicality of DL-based prefetching in far-memory systems and introduces design choices like rotary positional encoding and offline input generalization to make the approach robust and scalable.

Abstract

Memory prefetching has long boosted CPU caches and is increasingly vital for far-memory systems, where large portions of memory are offloaded to cheaper, remote tiers. While effective prefetching requires accurate prediction of future accesses, prior ML approaches have been limited to simulation or small-scale hardware. We introduce FarSight, the first Linux-based far-memory system to leverage deep learning by decoupling application semantics from runtime memory layout. This separation enables offline-trained models to predict access patterns over a compact ordinal vocabulary, which are resolved at runtime through lightweight mappings. Across four data-intensive workloads, FarSight delivers up to 3.6x higher performance than the state-of-the-art.

Paper Structure

This paper contains 13 sections, 23 figures, 1 table.

Figures (23)

  • Figure 1: FarSight overall architecture all red parts are FarSight.
  • Figure 2: FarSight prediction representation An example of vocabulary size ($K$) being 4. The top part shows code/algorithm corresponding to the accesses of chunks addr-x, addr-b, and addr-e. The bottom shows the input to the model: the chunk addresses and PCs of the 5 previous misses.
  • Figure 3: FarSight's prediction optimization methods Demonstrating the use of each history window to predict $s$ misses ahead of time and predicting $f=2$ pages at a time.
  • Figure 4: MCF performance.
  • Figure 5: XGBoost performance.
  • ...and 18 more figures