An Early Exploration of Deep-Learning-Driven Prefetching for Far Memory
Yutong Huang, Zhiyuan Guo, Yiying Zhang
TL;DR
The paper addresses the latency-energy trade-off in far-memory systems for data-center workloads. It proposes Memix, a DL-guided prefetcher integrated into Linux swap that decouples application semantics from runtime memory layout, using offline learned patterns and runtime address maps. Key contributions include a two-part separation of prediction tasks, a per-page future-map mechanism, a compact kernel-based RetNet predictor with $K=64$, and evaluation on XGBoost, PageRank, and MCF showing up to $42\%$ speedups. The results suggest Memix improves end-to-end performance and energy efficiency in data-center memory hierarchies by reducing far-memory accesses for repetitive patterns.
Abstract
Far-memory systems, where applications store less-active data in more energy-efficient memory media, are increasingly adopted by data centers. However, applications are bottlenecked by on-demand data fetching from far- to local-memory. We present Memix, a far-memory system that embodies a deep-learning-system co-design for efficient and accurate prefetching, minimizing on-demand far-memory accesses. One key observation is that memory accesses are shaped by both application semantics and runtime context, providing an opportunity to optimize each independently. Preliminary evaluation of Memix on data-intensive workloads shows that it outperforms the state-of-the-art far-memory system by up to 42%.
