Table of Contents
Fetching ...

A TRRIP Down Memory Lane: Temperature-Based Re-Reference Interval Prediction For Instruction Caching

Henry Kao, Nikhil Sreekumar, Prabhdeep Singh Soni, Ali Sedaghati, Fang Su, Bryan Chan, Maziar Goudarzi, Reza Azimi

TL;DR

The paper tackles CPU frontend stalls in mobile systems caused by instruction cache misses on modern, code-footprint-heavy software. It introduces TRRIP, a software-hardware co-design that leverages PGO-derived code temperature (hot/warm/cold) tagged at the page level to bias a temperature-aware instruction cache replacement policy, implemented with minimal hardware changes and no ISA modifications. TRRIP demonstrates a reduction in L2 instruction MPKI by $26.5\%$ (TRRIP-1) or $27.3\%$ (TRRIP-2) and a geomean speedup of $3.9\%$ over a baseline RRIP with PGO-optimized workloads, outperforming several hardware-only and hybrid approaches. The work highlights practical adoption aspects, including OS- and compiler-aided temperature signaling, and discusses limitations related to coverage of costly misses in external code and pages, proposing future extensions to broader cache hierarchies and server-class systems.

Abstract

Modern mobile CPU software pose challenges for conventional instruction cache replacement policies due to their complex runtime behavior causing high reuse distance between executions of the same instruction. Mobile code commonly suffers from large amounts of stalls in the CPU frontend and thus starvation of the rest of the CPU resources. Complexity of these applications and their code footprint are projected to grow at a rate faster than available on-chip memory due to power and area constraints, making conventional hardware-centric methods for managing instruction caches to be inadequate. We present a novel software-hardware co-design approach called TRRIP (Temperature-based Re-Reference Interval Prediction) that enables the compiler to analyze, classify, and transform code based on "temperature" (hot/cold), and to provide the hardware with a summary of code temperature information through a well-defined OS interface based on using code page attributes. TRRIP's lightweight hardware extension employs code temperature attributes to optimize the instruction cache replacement policy resulting in the eviction rate reduction of hot code. TRRIP is designed to be practical and adoptable in real mobile systems that have strict feature requirements on both the software and hardware components. TRRIP can reduce the L2 MPKI for instructions by 26.5% resulting in geomean speedup of 3.9%, on top of RRIP cache replacement running mobile code already optimized using PGO.

A TRRIP Down Memory Lane: Temperature-Based Re-Reference Interval Prediction For Instruction Caching

TL;DR

The paper tackles CPU frontend stalls in mobile systems caused by instruction cache misses on modern, code-footprint-heavy software. It introduces TRRIP, a software-hardware co-design that leverages PGO-derived code temperature (hot/warm/cold) tagged at the page level to bias a temperature-aware instruction cache replacement policy, implemented with minimal hardware changes and no ISA modifications. TRRIP demonstrates a reduction in L2 instruction MPKI by (TRRIP-1) or (TRRIP-2) and a geomean speedup of over a baseline RRIP with PGO-optimized workloads, outperforming several hardware-only and hybrid approaches. The work highlights practical adoption aspects, including OS- and compiler-aided temperature signaling, and discusses limitations related to coverage of costly misses in external code and pages, proposing future extensions to broader cache hierarchies and server-class systems.

Abstract

Modern mobile CPU software pose challenges for conventional instruction cache replacement policies due to their complex runtime behavior causing high reuse distance between executions of the same instruction. Mobile code commonly suffers from large amounts of stalls in the CPU frontend and thus starvation of the rest of the CPU resources. Complexity of these applications and their code footprint are projected to grow at a rate faster than available on-chip memory due to power and area constraints, making conventional hardware-centric methods for managing instruction caches to be inadequate. We present a novel software-hardware co-design approach called TRRIP (Temperature-based Re-Reference Interval Prediction) that enables the compiler to analyze, classify, and transform code based on "temperature" (hot/cold), and to provide the hardware with a summary of code temperature information through a well-defined OS interface based on using code page attributes. TRRIP's lightweight hardware extension employs code temperature attributes to optimize the instruction cache replacement policy resulting in the eviction rate reduction of hot code. TRRIP is designed to be practical and adoptable in real mobile systems that have strict feature requirements on both the software and hardware components. TRRIP can reduce the L2 MPKI for instructions by 26.5% resulting in geomean speedup of 3.9%, on top of RRIP cache replacement running mobile code already optimized using PGO.

Paper Structure

This paper contains 25 sections, 2 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Top-Down breakdown of hottest system software components which includes code interpreter (interp), and shared libraries for a user-interface framework (ui), graphics (graphics), rendering (render), and JavaScript runtime (js_runtime).
  • Figure 2: Top-Down profiles of proxy mobile benchmarks. Non-PGO compile and PGO compile (marked with "*") are shown. Cycles spent doing useful computation is shown as retire. The rest are cycles lost due to frontend stalls from instruction cache misses (ifetch), branch misprediction (mispred.), data dependencies (depend), saturated issue queues (issue), and backend stalls waiting for data from caches and main memory (mem).
  • Figure 3: Reuse distance distribution of hot cache lines measured in the L2 cache. Reuse is measured as the number of unique cache lines seen between two subsequent access of the same line for one given cache set. Applications post-fixed with "$\sim$" measures reuse distance only counting hot unique cache lines seen between two subsequent access of the same line for one given cache set.
  • Figure 4: Co-designed components and interfaces for TRRIP cache replacement.
  • Figure 5: ELF layout after PGO showing only the components TRRIP modifies, program headers and .text sections.
  • ...and 4 more figures