LFOC+: A Fair OS-level Cache-Clustering Policy for Commodity Multicore Systems
Juan Carlos Saez, Fernando Castro, Graziano Fanizzi, Manuel Prieto-Matias
TL;DR
LFOC+ tackles shared LLC contention in multicore systems by introducing a fairness-aware OS-level cache-clustering policy that dynamically classifies applications via performance-monitoring counters and couples this with a pairing-based clustering strategy to allow up to two cache-sensitive apps per partition. Implemented as a Linux kernel module, LFOC+ extends the authors' prior LFOC approach and demonstrates substantial fairness improvements over state-of-the-art policies while maintaining comparable throughput on Skylake hardware, approaching the optimal fairness solution in many scenarios. The approach relies on offline simulation (PBBCache) to derive insights and online PMC-driven adaptations (sampling and fairness phases) to react to workload changes, with special provisions for data-parallel multithreaded programs. The work also provides a practical path toward broader adoption via open-source tooling (PMCTrack) and discusses limitations and future directions, including deeper intra-application partitioning and support for AMD architectures with multiple LLCs.
Abstract
Commodity multicore systems are increasingly adopting hardware support that enables the system software to partition the last-level cache (LLC). This support makes it possible for the operating system (OS) or the Virtual Machine Monitor (VMM) to mitigate shared-resource contention effects on multicores by assigning different co-running applications to various cache partitions. Recently cache-clustering (or partition-sharing) strategies have emerged as a way to improve system throughput and fairness on new platforms with cache-partitioning support. As opposed to strict cache-partitioning, which allocates separate cache partitions to each application, cache-clustering allows partitions to be shared by a group of applications. In this article we propose LFOC+, a fairness-aware OS-level cache-clustering policy for commodity multicore systems. LFOC+ tries to mimic the behavior of the optimal cache-clustering solution for fairness, which we could obtain for different workload scenarios by using a simulation tool. Our dynamic cache-clustering strategy continuously gathers data from performance monitoring counters to classify applications at runtime based on the degree of cache sensitivity and contentiousness, and effectively separates cache-sensitive applications from aggressor programs to improve fairness, while providing acceptable system throughput. We implemented LFOC+ in the Linux kernel and evaluated it on a real system featuring an Intel Skylake processor, where we compare its effectiveness to that of four previously proposed cache-clustering policies. Our experimental analysis reveals that LFOC+ constitutes a lightweight OS-level policy and improves fairness relative to two other state-of-the-art fairness-aware strategies --Dunn and LFOC--, by up to 22\% and up to 20.6\%, respectively, and by 9\% and 4.9\% on average.
