LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores
Adrián García-García, Juan Carlos Sáez, Fernando Castro, Manuel Prieto-Matías
TL;DR
The paper tackles contention in last-level cache on multicore processors and its impact on fairness and throughput. It introduces LFOC, an OS-level cache clustering policy that uses Intel CAT to dynamically partition the LLC and isolates streaming aggressors from cache-sensitive applications. By approximating the optimal fairness solution with a parallel simulator and classifying apps into streaming, light-sharing, and cache-sensitive categories, LFOC allocates cache space via a lightweight online algorithm. Implemented in the Linux kernel and evaluated on an Intel Skylake system, LFOC delivers meaningful fairness improvements while maintaining competitive throughput, outperforming state-of-the-art partitioning and clustering policies in most scenarios. This work demonstrates a practical, kernel-level solution for fair cache sharing on commodity multicore platforms with potential impact on cloud, HPC, and general-purpose workloads.
Abstract
Multicore processors constitute the main architecture choice for modern computing systems in different market segments. Despite their benefits, the contention that naturally appears when multiple applications compete for the use of shared resources among cores, such as the last-level cache (LLC), may lead to substantial performance degradation. This may have a negative impact on key system aspects such as throughput and fairness. Assigning the various applications in the workload to separate LLC partitions with possibly different sizes, has been proven effective to mitigate shared-resource contention effects. In this article we propose LFOC, a clustering-based cache partitioning scheme that strives to deliver fairness while providing acceptable system throughput. LFOC leverages the Intel Cache Allocation Technology (CAT), which enables the system software to divide the LLC into different partitions. To accomplish its goals, LFOC tries to mimic the behavior of the optimal cache-clustering solution, which we could approximate by means of a simulator in different scenarios. To this end, LFOC effectively identifies streaming aggressor programs and cache sensitive applications, which are then assigned to separate cache partitions. We implemented LFOC in the Linux kernel and evaluated it on a real system featuring an Intel Skylake processor, where we compare its effectiveness to that of two state-of-the-art policies that optimize fairness and throughput, respectively. Our experimental analysis reveals that LFOC is able to bring a higher reduction in unfairness by leveraging a lightweight algorithm suitable for adoption in a real OS.
