Benchmarking for Single Feature Attribution with Microarchitecture Cliffs

Hao Zhen; Qingxuan Kang; Yungang Bao; Trevor E. Carlson

Benchmarking for Single Feature Attribution with Microarchitecture Cliffs

Hao Zhen, Qingxuan Kang, Yungang Bao, Trevor E. Carlson

TL;DR

This work tackles the critical problem of simulator-RTL fidelity in microarchitectural exploration. It introduces Microarchitecture Cliffs (Cliffs), a methodology with Cliff-SKP and Cliff-BACT components to isolate and benchmark a single microarchitectural feature, enabling precise calibration. Applied to XS-GEM5 vs XS-RTL, Cliffs reduces the overall error on Cliff Benchmarks from 59.2% to 1.4% and dramatically improves Store Set feature evaluation accuracy (relative error dropping to 0.83%), while also delivering substantial absolute error reductions on SPEC benchmarks. The approach generalizes to other processors (e.g., BOOM) and workloads (e.g., Verilator), providing a practical, scalable path for architecture-aware calibration in early design stages.

Abstract

Architectural simulators play a critical role in early microarchitectural exploration due to their flexibility and high productivity. However, their effectiveness is often constrained by fidelity: simulators may deviate from the behavior of the final RTL, leading to unreliable performance estimates. Consequently, model calibration, which aligns simulator behavior with the RTL as the ground-truth microarchitecture, becomes essential for achieving accurate performance modeling. To facilitate model calibration accuracy, we propose Microarchitecture Cliffs, a benchmark generation methodology designed to expose mismatches in microarchitectural behavior between the simulator and RTL. After identifying the key architectural components that require calibration, the Cliff methodology enables precise attribution of microarchitectural differences to a single microarchitectural feature through a set of benchmarks. In addition, we develop a set of automated tools to improve the efficiency of the Cliff workflow. We apply the Cliff methodology to calibrate the XiangShan version of gem5 (XS-GEM5) against the XiangShan open-source CPU (XS-RTL). We reduce the performance error of XS-GEM5 from 59.2% to just 1.4% on the Cliff benchmarks. Meanwhile, the calibration guided by Cliffs effectively reduces the relative error of a representative tightly coupled microarchitectural feature by 48.03%. It also substantially lowers the absolute performance error, with reductions of 15.1% and 21.0% on SPECint2017 and SPECfp2017, respectively.

Benchmarking for Single Feature Attribution with Microarchitecture Cliffs

TL;DR

Abstract

Paper Structure (30 sections, 14 figures, 4 tables)

This paper contains 30 sections, 14 figures, 4 tables.

Introduction
Background
Specification-Driven Calibration
Behavior-Driven Calibration
Motivation: Single Feature Attribution
Design
Cliff-SKP
Cliff-BACT
Bridge the Gap Between the Instruction Level and the Microarchitecture Level
Reduce Interference Between Different Microarchitectural Features Within a Single Benchmark
Organizing Cliff Benchmarks by Microarchitectural Feature
Identifying Undervalued Microarchitectural Features
Cliffs Design Case Study 1: ROB
Determining the Existence of ROB Compression
Determining the Conditions for Compression
...and 15 more sections

Figures (14)

Figure 1: Microbenchmark vs. Cliffs calibration. Microbenchmark confounds the effect of multiple microarchitectural features, while Cliff benchmarks isolate each feature individually.
Figure 2: M-I microbenchmark vs. Cliffs for L1 DCache bandwidth. M-I shows minimal IPC sensitivity to LDPipe, while Cliffs reveal clear distinctions among pipeline widths.
Figure 3: Overview of the Cliff methodology. Cliff-SKP clusters performance-counter probes to identify key architectural bottlenecks. Based on the key architectures, Cliff-BACT constructs Cliff benchmarks that isolate individual microarchitectural traits for single-feature performance attribution
Figure 4: Second-round clustering of backend performance counters identifies floating-point bandwidth, memory instruction bandwidth, L1/L2 caches, and system memory as concrete directions for Cliff benchmark construction.
Figure 5: ROB-related Cliff benchmarks.
...and 9 more figures

Benchmarking for Single Feature Attribution with Microarchitecture Cliffs

TL;DR

Abstract

Benchmarking for Single Feature Attribution with Microarchitecture Cliffs

Authors

TL;DR

Abstract

Table of Contents

Figures (14)