Table of Contents
Fetching ...

Improving the Representativeness of Simulation Intervals for the Cache Memory System

Nicolas Bueno, Fernando Castro, Luis Pinuel, Jose Ignacio Gomez-Perez, Francky Catthoor

TL;DR

The paper addresses the problem that conventional simulation windows, such as SimPoint or fast-forward strategies, can misrepresent last-level cache (LLC) behavior and bias the evaluation of cache-related microarchitectural proposals. It introduces an LLC-activity–driven approach that reweights the standard SimPoint intervals using metrics derived from LLC pressure, via two strategies: mpkilru (weights based on LLC MPKI under LRU) and mpkimax (weights based on the maximum LLC MPKI across policies). The authors show that these methods substantially reduce misorder among cache policies (by over 30% on average) and improve closeness to full-simulation LLC MPKI and CPI results, particularly for memory-intensive workloads, without increasing simulation time. This work offers a more faithful framework for comparing cache-related ideas and can positively impact energy use and end-user performance predictions in memory-system design. The core ideas are formalized with weights $weight_s = MPKI_{LRU,s} / \sum_{i=1}^{n} MPKI_{LRU,i}$ and $weight_s = MPKI_{max,s} / \sum_{i=1}^{n} MPKI_{max,i}$, enabling a fairer, LLC-focused assessment of proposals.

Abstract

Accurate simulation techniques are indispensable to efficiently propose new memory or architectural organizations. As implementing new hardware concepts in real systems is often not feasible, cycle-accurate simulators employed together with certain benchmarks are commonly used. However, detailed simulators may take too much time to execute these programs until completion. Therefore, several techniques aimed at reducing this time are usually employed. These schemes select fragments of the source code considered as representative of the entire application's behaviour -- mainly in terms of performance, but not plenty considering the behaviour of cache memory levels -- and only these intervals are simulated. Our hypothesis is that the different simulation windows currently employed when evaluating microarchitectural proposals, especially those involving the last level cache (LLC), do not reproduce the overall cache behaviour during the entire execution, potentially leading to wrong conclusions on the real performance of the proposals assessed. In this work, we first demonstrate this hypothesis by evaluating different cache replacement policies using various typical simulation approaches. Consequently, we also propose a simulation strategy, based on the applications' LLC activity, which mimics the overall behaviour of the cache much closer than conventional simulation intervals. Our proposal allows a fairer comparison between cache-related approaches as it reports, on average, a number of changes in the relative order among the policies assessed -- with respect to the full simulation -- more than 30\% lower than that of conventional strategies, maintaining the simulation time largely unchanged and without losing accuracy on performance terms, especially for memory-intensive applications.

Improving the Representativeness of Simulation Intervals for the Cache Memory System

TL;DR

The paper addresses the problem that conventional simulation windows, such as SimPoint or fast-forward strategies, can misrepresent last-level cache (LLC) behavior and bias the evaluation of cache-related microarchitectural proposals. It introduces an LLC-activity–driven approach that reweights the standard SimPoint intervals using metrics derived from LLC pressure, via two strategies: mpkilru (weights based on LLC MPKI under LRU) and mpkimax (weights based on the maximum LLC MPKI across policies). The authors show that these methods substantially reduce misorder among cache policies (by over 30% on average) and improve closeness to full-simulation LLC MPKI and CPI results, particularly for memory-intensive workloads, without increasing simulation time. This work offers a more faithful framework for comparing cache-related ideas and can positively impact energy use and end-user performance predictions in memory-system design. The core ideas are formalized with weights and , enabling a fairer, LLC-focused assessment of proposals.

Abstract

Accurate simulation techniques are indispensable to efficiently propose new memory or architectural organizations. As implementing new hardware concepts in real systems is often not feasible, cycle-accurate simulators employed together with certain benchmarks are commonly used. However, detailed simulators may take too much time to execute these programs until completion. Therefore, several techniques aimed at reducing this time are usually employed. These schemes select fragments of the source code considered as representative of the entire application's behaviour -- mainly in terms of performance, but not plenty considering the behaviour of cache memory levels -- and only these intervals are simulated. Our hypothesis is that the different simulation windows currently employed when evaluating microarchitectural proposals, especially those involving the last level cache (LLC), do not reproduce the overall cache behaviour during the entire execution, potentially leading to wrong conclusions on the real performance of the proposals assessed. In this work, we first demonstrate this hypothesis by evaluating different cache replacement policies using various typical simulation approaches. Consequently, we also propose a simulation strategy, based on the applications' LLC activity, which mimics the overall behaviour of the cache much closer than conventional simulation intervals. Our proposal allows a fairer comparison between cache-related approaches as it reports, on average, a number of changes in the relative order among the policies assessed -- with respect to the full simulation -- more than 30\% lower than that of conventional strategies, maintaining the simulation time largely unchanged and without losing accuracy on performance terms, especially for memory-intensive applications.
Paper Structure (17 sections, 2 equations, 4 figures, 6 tables, 2 algorithms)

This paper contains 17 sections, 2 equations, 4 figures, 6 tables, 2 algorithms.

Figures (4)

  • Figure 1: LLC MPKI obtained with the four simulation strategies, using gem5, for different benchmarks and cache replacement policies.
  • Figure 2: Zoom into LLC MPKI obtained with SimPoint and full simulation, using gem5, for different benchmarks and cache replacement policies.
  • Figure 3: LLC MPKI values obtained with the full simulation, using gem5, for different benchmarks as instructions are executed using LRU policy.
  • Figure 4: LLC MPKI obtained with full simulation, weight, spt, mpkilru and mpkimax simulation strategies, using gem5, for different benchmarks and cache replacement policies.