Table of Contents
Fetching ...

Correct Wrong Path

Bhargav Reddy Godala, Sankara Prasad Ramesh, Krishnam Tibrewala, Chrysanthos Pepi, Gino Chacon, Svilen Kanev, Gilles A. Pokam, Daniel A. Jiménez, Paul V. Gratz, David I. August

TL;DR

This work addresses the gap where trace-driven CPU simulators fail to capture wrong-path (WP) effects, which can alter cache states and performance. It introduces a WP model, a WP trace format, and modifications to ChampSim to execute and repair WP-instructions within a trace-driven flow, enabling WP-aware exploration at near trace-driven speeds. Using gem5 to generate WP traces and a Golden Cove–like core, the authors evaluate 14 server and datacenter workloads, revealing IPC changes from $-3.09\%$ to $20.9\%$ with a mean of $3.26\%$, largely driven by WP prefetching. They show WP reshapes cache behavior across L1I/L1D/L2C/LLC and conclude with open WP traces and tooling to accelerate research while preserving IP.

Abstract

Modern OOO CPUs have very deep pipelines with large branch misprediction recovery penalties. Speculatively executed instructions on the wrong path can significantly change cache state, depending on speculation levels. Architects often employ trace-driven simulation models in the design exploration stage, which sacrifice precision for speed. Trace-driven simulators are orders of magnitude faster than execution-driven models, reducing the often hundreds of thousands of simulation hours needed to explore new micro-architectural ideas. Despite this strong benefit of trace-driven simulation, these often fail to adequately model the consequences of wrong path because obtaining them is nontrivial. Prior works consider either a positive or negative impact of wrong path but not both. Here, we examine wrong path execution in simulation results and design a set of infrastructure for enabling wrong-path execution in a trace driven simulator. Our analysis shows the wrong path affects structures on both the instruction and data sides extensively, resulting in performance variations ranging from $-3.05$\% to $20.9$\% when ignoring wrong path. To benefit the research community and enhance the accuracy of simulators, we opened our traces and tracing utility in the hopes that industry can provide wrong-path traces generated by their internal simulators, enabling academic simulation without exposing industry IP.

Correct Wrong Path

TL;DR

This work addresses the gap where trace-driven CPU simulators fail to capture wrong-path (WP) effects, which can alter cache states and performance. It introduces a WP model, a WP trace format, and modifications to ChampSim to execute and repair WP-instructions within a trace-driven flow, enabling WP-aware exploration at near trace-driven speeds. Using gem5 to generate WP traces and a Golden Cove–like core, the authors evaluate 14 server and datacenter workloads, revealing IPC changes from to with a mean of , largely driven by WP prefetching. They show WP reshapes cache behavior across L1I/L1D/L2C/LLC and conclude with open WP traces and tooling to accelerate research while preserving IP.

Abstract

Modern OOO CPUs have very deep pipelines with large branch misprediction recovery penalties. Speculatively executed instructions on the wrong path can significantly change cache state, depending on speculation levels. Architects often employ trace-driven simulation models in the design exploration stage, which sacrifice precision for speed. Trace-driven simulators are orders of magnitude faster than execution-driven models, reducing the often hundreds of thousands of simulation hours needed to explore new micro-architectural ideas. Despite this strong benefit of trace-driven simulation, these often fail to adequately model the consequences of wrong path because obtaining them is nontrivial. Prior works consider either a positive or negative impact of wrong path but not both. Here, we examine wrong path execution in simulation results and design a set of infrastructure for enabling wrong-path execution in a trace driven simulator. Our analysis shows the wrong path affects structures on both the instruction and data sides extensively, resulting in performance variations ranging from \% to \% when ignoring wrong path. To benefit the research community and enhance the accuracy of simulators, we opened our traces and tracing utility in the hopes that industry can provide wrong-path traces generated by their internal simulators, enabling academic simulation without exposing industry IP.
Paper Structure (13 sections, 7 figures, 2 tables)

This paper contains 13 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: WP traces from execution-driven to trace-driven simulator
  • Figure 2: Relative Increase in Instructions in WP vs CP
  • Figure 3: Cache stats for WP and CP modes in L1I
  • Figure 4: Cache miss stats for WP and CP modes in for all caches
  • Figure 5: Relative increase in cache hits of WP mode w.r.t CP mode
  • ...and 2 more figures