Table of Contents
Fetching ...

LFOC+: A Fair OS-level Cache-Clustering Policy for Commodity Multicore Systems

Juan Carlos Saez, Fernando Castro, Graziano Fanizzi, Manuel Prieto-Matias

TL;DR

LFOC+ tackles shared LLC contention in multicore systems by introducing a fairness-aware OS-level cache-clustering policy that dynamically classifies applications via performance-monitoring counters and couples this with a pairing-based clustering strategy to allow up to two cache-sensitive apps per partition. Implemented as a Linux kernel module, LFOC+ extends the authors' prior LFOC approach and demonstrates substantial fairness improvements over state-of-the-art policies while maintaining comparable throughput on Skylake hardware, approaching the optimal fairness solution in many scenarios. The approach relies on offline simulation (PBBCache) to derive insights and online PMC-driven adaptations (sampling and fairness phases) to react to workload changes, with special provisions for data-parallel multithreaded programs. The work also provides a practical path toward broader adoption via open-source tooling (PMCTrack) and discusses limitations and future directions, including deeper intra-application partitioning and support for AMD architectures with multiple LLCs.

Abstract

Commodity multicore systems are increasingly adopting hardware support that enables the system software to partition the last-level cache (LLC). This support makes it possible for the operating system (OS) or the Virtual Machine Monitor (VMM) to mitigate shared-resource contention effects on multicores by assigning different co-running applications to various cache partitions. Recently cache-clustering (or partition-sharing) strategies have emerged as a way to improve system throughput and fairness on new platforms with cache-partitioning support. As opposed to strict cache-partitioning, which allocates separate cache partitions to each application, cache-clustering allows partitions to be shared by a group of applications. In this article we propose LFOC+, a fairness-aware OS-level cache-clustering policy for commodity multicore systems. LFOC+ tries to mimic the behavior of the optimal cache-clustering solution for fairness, which we could obtain for different workload scenarios by using a simulation tool. Our dynamic cache-clustering strategy continuously gathers data from performance monitoring counters to classify applications at runtime based on the degree of cache sensitivity and contentiousness, and effectively separates cache-sensitive applications from aggressor programs to improve fairness, while providing acceptable system throughput. We implemented LFOC+ in the Linux kernel and evaluated it on a real system featuring an Intel Skylake processor, where we compare its effectiveness to that of four previously proposed cache-clustering policies. Our experimental analysis reveals that LFOC+ constitutes a lightweight OS-level policy and improves fairness relative to two other state-of-the-art fairness-aware strategies --Dunn and LFOC--, by up to 22\% and up to 20.6\%, respectively, and by 9\% and 4.9\% on average.

LFOC+: A Fair OS-level Cache-Clustering Policy for Commodity Multicore Systems

TL;DR

LFOC+ tackles shared LLC contention in multicore systems by introducing a fairness-aware OS-level cache-clustering policy that dynamically classifies applications via performance-monitoring counters and couples this with a pairing-based clustering strategy to allow up to two cache-sensitive apps per partition. Implemented as a Linux kernel module, LFOC+ extends the authors' prior LFOC approach and demonstrates substantial fairness improvements over state-of-the-art policies while maintaining comparable throughput on Skylake hardware, approaching the optimal fairness solution in many scenarios. The approach relies on offline simulation (PBBCache) to derive insights and online PMC-driven adaptations (sampling and fairness phases) to react to workload changes, with special provisions for data-parallel multithreaded programs. The work also provides a practical path toward broader adoption via open-source tooling (PMCTrack) and discusses limitations and future directions, including deeper intra-application partitioning and support for AMD architectures with multiple LLCs.

Abstract

Commodity multicore systems are increasingly adopting hardware support that enables the system software to partition the last-level cache (LLC). This support makes it possible for the operating system (OS) or the Virtual Machine Monitor (VMM) to mitigate shared-resource contention effects on multicores by assigning different co-running applications to various cache partitions. Recently cache-clustering (or partition-sharing) strategies have emerged as a way to improve system throughput and fairness on new platforms with cache-partitioning support. As opposed to strict cache-partitioning, which allocates separate cache partitions to each application, cache-clustering allows partitions to be shared by a group of applications. In this article we propose LFOC+, a fairness-aware OS-level cache-clustering policy for commodity multicore systems. LFOC+ tries to mimic the behavior of the optimal cache-clustering solution for fairness, which we could obtain for different workload scenarios by using a simulation tool. Our dynamic cache-clustering strategy continuously gathers data from performance monitoring counters to classify applications at runtime based on the degree of cache sensitivity and contentiousness, and effectively separates cache-sensitive applications from aggressor programs to improve fairness, while providing acceptable system throughput. We implemented LFOC+ in the Linux kernel and evaluated it on a real system featuring an Intel Skylake processor, where we compare its effectiveness to that of four previously proposed cache-clustering policies. Our experimental analysis reveals that LFOC+ constitutes a lightweight OS-level policy and improves fairness relative to two other state-of-the-art fairness-aware strategies --Dunn and LFOC--, by up to 22\% and up to 20.6\%, respectively, and by 9\% and 4.9\% on average.
Paper Structure (15 sections, 3 equations, 10 figures, 5 tables, 3 algorithms)

This paper contains 15 sections, 3 equations, 10 figures, 5 tables, 3 algorithms.

Figures (10)

  • Figure 1: Slowdown and LLCMPKC for different way counts
  • Figure 2: Cluster count and breakdown of applications into different categories for each cluster size
  • Figure 3: Distribution of LLC-space made by (a) UCP-Slowdown and (b) Best fairness-wise clustering with clusters of up to 2 applications
  • Figure 4: Sampling mode's LLC space distribution
  • Figure 5: Multiprogram workloads used for our experiments. The "x2" mark indicates that 2 instances of a benchmark are present in a workload
  • ...and 5 more figures