Learning Semantics, Not Addresses: Runtime Neural Prefetching for Far Memory
Yutong Huang, Zhiyuan Guo, Yiying Zhang
TL;DR
FarSight addresses the challenge of prefetching for far-memory systems by decoupling application semantics from runtime memory addresses and training a compact neural predictor offline. It represents memory access outcomes as ordinals in a small vocabulary and resolves them at runtime with per-page future maps, enabling low-latency, look-ahead prefetching. Implemented in the Linux kernel with a per-core RetNet-based predictor, FarSight achieves sub-microsecond inferences and demonstrates up to $3.6\times$ speedups over state-of-the-art baselines across four data-intensive workloads. The work shows the practicality of DL-based prefetching in far-memory systems and introduces design choices like rotary positional encoding and offline input generalization to make the approach robust and scalable.
Abstract
Memory prefetching has long boosted CPU caches and is increasingly vital for far-memory systems, where large portions of memory are offloaded to cheaper, remote tiers. While effective prefetching requires accurate prediction of future accesses, prior ML approaches have been limited to simulation or small-scale hardware. We introduce FarSight, the first Linux-based far-memory system to leverage deep learning by decoupling application semantics from runtime memory layout. This separation enables offline-trained models to predict access patterns over a compact ordinal vocabulary, which are resolved at runtime through lightweight mappings. Across four data-intensive workloads, FarSight delivers up to 3.6x higher performance than the state-of-the-art.
