Table of Contents
Fetching ...

An Early Exploration of Deep-Learning-Driven Prefetching for Far Memory

Yutong Huang, Zhiyuan Guo, Yiying Zhang

TL;DR

The paper addresses the latency-energy trade-off in far-memory systems for data-center workloads. It proposes Memix, a DL-guided prefetcher integrated into Linux swap that decouples application semantics from runtime memory layout, using offline learned patterns and runtime address maps. Key contributions include a two-part separation of prediction tasks, a per-page future-map mechanism, a compact kernel-based RetNet predictor with $K=64$, and evaluation on XGBoost, PageRank, and MCF showing up to $42\%$ speedups. The results suggest Memix improves end-to-end performance and energy efficiency in data-center memory hierarchies by reducing far-memory accesses for repetitive patterns.

Abstract

Far-memory systems, where applications store less-active data in more energy-efficient memory media, are increasingly adopted by data centers. However, applications are bottlenecked by on-demand data fetching from far- to local-memory. We present Memix, a far-memory system that embodies a deep-learning-system co-design for efficient and accurate prefetching, minimizing on-demand far-memory accesses. One key observation is that memory accesses are shaped by both application semantics and runtime context, providing an opportunity to optimize each independently. Preliminary evaluation of Memix on data-intensive workloads shows that it outperforms the state-of-the-art far-memory system by up to 42%.

An Early Exploration of Deep-Learning-Driven Prefetching for Far Memory

TL;DR

The paper addresses the latency-energy trade-off in far-memory systems for data-center workloads. It proposes Memix, a DL-guided prefetcher integrated into Linux swap that decouples application semantics from runtime memory layout, using offline learned patterns and runtime address maps. Key contributions include a two-part separation of prediction tasks, a per-page future-map mechanism, a compact kernel-based RetNet predictor with , and evaluation on XGBoost, PageRank, and MCF showing up to speedups. The results suggest Memix improves end-to-end performance and energy efficiency in data-center memory hierarchies by reducing far-memory accesses for repetitive patterns.

Abstract

Far-memory systems, where applications store less-active data in more energy-efficient memory media, are increasingly adopted by data centers. However, applications are bottlenecked by on-demand data fetching from far- to local-memory. We present Memix, a far-memory system that embodies a deep-learning-system co-design for efficient and accurate prefetching, minimizing on-demand far-memory accesses. One key observation is that memory accesses are shaped by both application semantics and runtime context, providing an opportunity to optimize each independently. Preliminary evaluation of Memix on data-intensive workloads shows that it outperforms the state-of-the-art far-memory system by up to 42%.

Paper Structure

This paper contains 18 sections, 5 figures.

Figures (5)

  • Figure 1: Memix prediction representationAn example of vocabulary size ($K$) being 4. The top part shows code/algorithm corresponding to the accesses of chunks addr-x, addr-b, and addr-e. The bottom shows the input to the model: the chunk addresses and PCs of the 5 previous misses.
  • Figure 2: Memix threading modelThe left panel shows the Memix system’s memory layout. The middle panel shows local memory misses during single-core execution, where Memix waits only for on-demand requests. The right panel shows that prefetch hits incur no network request.
  • Figure 3: XGBoost performance
  • Figure 4: Pagerank performance
  • Figure 5: MCF performance