Table of Contents
Fetching ...

CAPSim: A Fast CPU Performance Simulator Using Attention-based Predictor

Buqing Xu, Jianfeng Zhu, Yichi Zhang, Qinyi Cai, Guanhua Li, Shaojun Wei, Leibo Liu

TL;DR

This paper tackles the bottleneck of slow cycle-level CPU simulators for full benchmarks by introducing CAPSim, an attention-based performance predictor coupled with code trace clipping. By partitioning programs into code trace clips, constructing a context-rich representation of CPU state, and training a transformer-style predictor, CAPSim predicts full-program execution time with GPU-accelerated inference. The approach achieves up to 8.3x speedups over gem5 while maintaining accuracy close to cycle-level simulators, and demonstrates strong generalization across benchmarks and microarchitectures. The work highlights a practical path for rapid architectural exploration using deep learning to model long-range instruction dependencies and context-aware performance effects.

Abstract

CPU simulators are vital for computer architecture research, primarily for estimating performance under different programs. This poses challenges for fast and accurate simulation of modern CPUs, especially in multi-core systems. Modern CPU peformance simulators such as GEM5 adopt the cycle-accurate and event-driven approach, which is timeconsuming to simulate the extensive microarchitectural behavior of a real benchmark running on out-of-order CPUs. Recently, machine leaning based approach has been proposed to improve simulation speed, but they are currently limited to estimating the cycles of basic blocks rather than the complete benchmark program. This paper introduces a novel ML-based CPU simulator named CAPSim, which uses an attention-based neural network performance predictor and instruction trace sampling method annotated with context. The attention mechanism effectively captures long-range influence within the instruction trace, emphasizing critical context information. This allows the model to improve performance prediction accuracy by focusing on important code instruction. CAPSim can predict the execution time of unseen benchmarks at a significantly fast speed compared with an accurate O3 simulator built with gem5. Our evaluation on a commercial Intel Xeon CPU demonstrates that CAPSim achieves a 2.2 - 8.3x speedup compared to using gem5 built simulator, which is superior to the cutting-edge deep learning approach

CAPSim: A Fast CPU Performance Simulator Using Attention-based Predictor

TL;DR

This paper tackles the bottleneck of slow cycle-level CPU simulators for full benchmarks by introducing CAPSim, an attention-based performance predictor coupled with code trace clipping. By partitioning programs into code trace clips, constructing a context-rich representation of CPU state, and training a transformer-style predictor, CAPSim predicts full-program execution time with GPU-accelerated inference. The approach achieves up to 8.3x speedups over gem5 while maintaining accuracy close to cycle-level simulators, and demonstrates strong generalization across benchmarks and microarchitectures. The work highlights a practical path for rapid architectural exploration using deep learning to model long-range instruction dependencies and context-aware performance effects.

Abstract

CPU simulators are vital for computer architecture research, primarily for estimating performance under different programs. This poses challenges for fast and accurate simulation of modern CPUs, especially in multi-core systems. Modern CPU peformance simulators such as GEM5 adopt the cycle-accurate and event-driven approach, which is timeconsuming to simulate the extensive microarchitectural behavior of a real benchmark running on out-of-order CPUs. Recently, machine leaning based approach has been proposed to improve simulation speed, but they are currently limited to estimating the cycles of basic blocks rather than the complete benchmark program. This paper introduces a novel ML-based CPU simulator named CAPSim, which uses an attention-based neural network performance predictor and instruction trace sampling method annotated with context. The attention mechanism effectively captures long-range influence within the instruction trace, emphasizing critical context information. This allows the model to improve performance prediction accuracy by focusing on important code instruction. CAPSim can predict the execution time of unseen benchmarks at a significantly fast speed compared with an accurate O3 simulator built with gem5. Our evaluation on a commercial Intel Xeon CPU demonstrates that CAPSim achieves a 2.2 - 8.3x speedup compared to using gem5 built simulator, which is superior to the cutting-edge deep learning approach

Paper Structure

This paper contains 22 sections, 11 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: The workflow of CAPSim.
  • Figure 2: The workflow of producing code trace clips from the SPEC 2017 suite.
  • Figure 3: The workflow of the trace clip sampler.
  • Figure 4: Overview of the performance predictor.
  • Figure 5: Examples of standardization transformation.
  • ...and 6 more figures