Table of Contents
Fetching ...

Mercury: QoS-Aware Tiered Memory System

Jiaheng Lu, Yiwen Zhang, Hasan Al Maruf, Minseo Park, Yunxuan Tang, Fan Lai, Mosharaf Chowdhury

TL;DR

Mercury addresses the problem of performance unpredictability in tiered memory systems when multiple memory-intensive applications share resources. It introduces application-level QoS through per-tier page reclamation, a proactive admission control algorithm, and real-time adaptation to maintain SLOs amid local memory contention and memory bandwidth interference. The design is implemented in the Linux kernel with a memory profiler and a resource controller, and evaluated on 80 workloads, showing up to 53.4% better performance than TPP and 20.3% better than Colloid, plus significantly improved SLO satisfaction in long-running scenarios. The work demonstrates that combining per-tier memory management with proactive admission and dynamic adjustment yields predictable performance and higher resource utilization in CXL-enabled tiered memory systems.

Abstract

Memory tiering has received wide adoption in recent years as an effective solution to address the increasing memory demands of memory-intensive workloads. However, existing tiered memory systems often fail to meet service-level objectives (SLOs) when multiple applications share the system because they lack Quality-of-Service (QoS) support. Consequently, applications suffer severe performance drops due to local memory contention and memory bandwidth interference. In this paper, we present Mercury, a QoS-aware tiered memory system that ensures predictable performance for coexisting memory-intensive applications with different SLOs. Mercury enables per-tier page reclamation for application-level resource management and uses a proactive admission control algorithm to satisfy SLOs via per-tier memory capacity allocation and intra- and inter-tier bandwidth interference mitigation. It reacts to dynamic requirement changes via real-time adaptation. Extensive evaluations show that Mercury improves application performance by up to 53.4% and 20.3% compared to TPP and Colloid, respectively.

Mercury: QoS-Aware Tiered Memory System

TL;DR

Mercury addresses the problem of performance unpredictability in tiered memory systems when multiple memory-intensive applications share resources. It introduces application-level QoS through per-tier page reclamation, a proactive admission control algorithm, and real-time adaptation to maintain SLOs amid local memory contention and memory bandwidth interference. The design is implemented in the Linux kernel with a memory profiler and a resource controller, and evaluated on 80 workloads, showing up to 53.4% better performance than TPP and 20.3% better than Colloid, plus significantly improved SLO satisfaction in long-running scenarios. The work demonstrates that combining per-tier memory management with proactive admission and dynamic adjustment yields predictable performance and higher resource utilization in CXL-enabled tiered memory systems.

Abstract

Memory tiering has received wide adoption in recent years as an effective solution to address the increasing memory demands of memory-intensive workloads. However, existing tiered memory systems often fail to meet service-level objectives (SLOs) when multiple applications share the system because they lack Quality-of-Service (QoS) support. Consequently, applications suffer severe performance drops due to local memory contention and memory bandwidth interference. In this paper, we present Mercury, a QoS-aware tiered memory system that ensures predictable performance for coexisting memory-intensive applications with different SLOs. Mercury enables per-tier page reclamation for application-level resource management and uses a proactive admission control algorithm to satisfy SLOs via per-tier memory capacity allocation and intra- and inter-tier bandwidth interference mitigation. It reacts to dynamic requirement changes via real-time adaptation. Extensive evaluations show that Mercury improves application performance by up to 53.4% and 20.3% compared to TPP and Colloid, respectively.

Paper Structure

This paper contains 29 sections, 16 figures.

Figures (16)

  • Figure 1: Latency and bandwidth performance at different CXL interleaving ratios to illustrate the impact of local memory.
  • Figure 2: Performance of LS when BI requires bandwidth at different CXL interleaving percentage. Migrating BI to CXL does not always lead to better performance of LS due to inter-tier interference. BI's performance is very close to Figure \ref{['fig:moti-bw-sweep']} and omitted for brevity.
  • Figure 3: Architectural diagram of how memory requests are handled in CXL-enabled tiered memory.
  • Figure 4: Performance of LS at different CXL interleaving ratios when BI is fixed on local memory. Migrating more requests away from local memory does not improve performance as more requests are accessing the slower tier.
  • Figure 5: Unpredictable performance of 80 workloads and VectorDB as they compete for local memory on the fast tier. Existing solutions cannot distinguish among applications when migrating their hot pages, and thus cannot provide QoS guarantees.
  • ...and 11 more figures