Table of Contents
Fetching ...

Slice+Slice Baby: Generating Last-Level Cache Eviction Sets in the Blink of an Eye

Bradley Morgan, Gal Horowitz, Sioli O'Connell, Stephan van Schaik, Chitchanok Chuengsatiansup, Daniel Genkin, Olaf Maennel, Paul Montague, Eyal Ronen, Yuval Yarom

TL;DR

The paper tackles the problem of efficiently generating last-level cache (LLC) eviction sets on Intel CPUs with sliced caches, where the slice mapping is hidden and can be non-linear. It introduces a comparator-based microarchitectural gate to predict LLC slices, and develops intra-page propagation methods to infer page-slice mappings with limited measurements. The authors propose three optimizations for full LLC eviction-set generation that drastically speed up eviction-set construction compared to state-of-the-art methods, including handling non-linear slice functions. The work demonstrates substantial speedups across multiple processors and provides open-source code, advancing the practicality of cache-attacks and the evaluation of cache-defense mechanisms.

Abstract

An essential step for mounting cache attacks is finding eviction sets, collections of memory locations that contend on cache space. On Intel processors, one of the main challenges for identifying contending addresses is the sliced cache design, where the processor hashes the physical address to determine where in the cache a memory location is stored. While past works have demonstrated that the hash function can be reversed, they also showed that it depends on physical address bits that the adversary does not know. In this work, we make three main contributions to the art of finding eviction sets. We first exploit microarchitectural races to compare memory access times and identify the cache slice to which an address maps. We then use the known hash function to both reduce the error rate in our slice identification method and to reduce the work by extrapolating slice mappings to untested memory addresses. Finally, we show how to propagate information on eviction sets across different page offsets for the hitherto unexplored case of non-linear hash functions. Our contributions allow for entire LLC eviction set generation in 0.7 seconds on the Intel i7-9850H and 1.6 seconds on the i9-10900K, both using non-linear functions. This represents a significant improvement compared to state-of-the-art techniques taking 9x and 10x longer, respectively.

Slice+Slice Baby: Generating Last-Level Cache Eviction Sets in the Blink of an Eye

TL;DR

The paper tackles the problem of efficiently generating last-level cache (LLC) eviction sets on Intel CPUs with sliced caches, where the slice mapping is hidden and can be non-linear. It introduces a comparator-based microarchitectural gate to predict LLC slices, and develops intra-page propagation methods to infer page-slice mappings with limited measurements. The authors propose three optimizations for full LLC eviction-set generation that drastically speed up eviction-set construction compared to state-of-the-art methods, including handling non-linear slice functions. The work demonstrates substantial speedups across multiple processors and provides open-source code, advancing the practicality of cache-attacks and the evaluation of cache-defense mechanisms.

Abstract

An essential step for mounting cache attacks is finding eviction sets, collections of memory locations that contend on cache space. On Intel processors, one of the main challenges for identifying contending addresses is the sliced cache design, where the processor hashes the physical address to determine where in the cache a memory location is stored. While past works have demonstrated that the hash function can be reversed, they also showed that it depends on physical address bits that the adversary does not know. In this work, we make three main contributions to the art of finding eviction sets. We first exploit microarchitectural races to compare memory access times and identify the cache slice to which an address maps. We then use the known hash function to both reduce the error rate in our slice identification method and to reduce the work by extrapolating slice mappings to untested memory addresses. Finally, we show how to propagate information on eviction sets across different page offsets for the hitherto unexplored case of non-linear hash functions. Our contributions allow for entire LLC eviction set generation in 0.7 seconds on the Intel i7-9850H and 1.6 seconds on the i9-10900K, both using non-linear functions. This represents a significant improvement compared to state-of-the-art techniques taking 9x and 10x longer, respectively.

Paper Structure

This paper contains 41 sections, 4 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Logical structure of the ring interconnect on Intel Core CPUs. It allows for the bi-directional data transfer paccagnellaLordRingSide10.1007/978-3-030-80825-9_14 between cores, slices/sub-slices and other structures.
  • Figure 2: Operation of the weird NOT gate, which inverts the cached state of input to output. (Adapted from: horowitzSpecoScopeCacheProbing2024.)
  • Figure 3: LLC slice access latency probability distributions for the four slice i7-6700K, measured from core zero.
  • Figure 4: RDTSCP and fixed-delay chain slice classification accuracy in a low-noise system, measured from core zero.
  • Figure 5: Fixed-delay chain gate win probability as a function of the delay chain length, measured from core zero.
  • ...and 5 more figures