CAPSim: A Fast CPU Performance Simulator Using Attention-based Predictor
Buqing Xu, Jianfeng Zhu, Yichi Zhang, Qinyi Cai, Guanhua Li, Shaojun Wei, Leibo Liu
TL;DR
This paper tackles the bottleneck of slow cycle-level CPU simulators for full benchmarks by introducing CAPSim, an attention-based performance predictor coupled with code trace clipping. By partitioning programs into code trace clips, constructing a context-rich representation of CPU state, and training a transformer-style predictor, CAPSim predicts full-program execution time with GPU-accelerated inference. The approach achieves up to 8.3x speedups over gem5 while maintaining accuracy close to cycle-level simulators, and demonstrates strong generalization across benchmarks and microarchitectures. The work highlights a practical path for rapid architectural exploration using deep learning to model long-range instruction dependencies and context-aware performance effects.
Abstract
CPU simulators are vital for computer architecture research, primarily for estimating performance under different programs. This poses challenges for fast and accurate simulation of modern CPUs, especially in multi-core systems. Modern CPU peformance simulators such as GEM5 adopt the cycle-accurate and event-driven approach, which is timeconsuming to simulate the extensive microarchitectural behavior of a real benchmark running on out-of-order CPUs. Recently, machine leaning based approach has been proposed to improve simulation speed, but they are currently limited to estimating the cycles of basic blocks rather than the complete benchmark program. This paper introduces a novel ML-based CPU simulator named CAPSim, which uses an attention-based neural network performance predictor and instruction trace sampling method annotated with context. The attention mechanism effectively captures long-range influence within the instruction trace, emphasizing critical context information. This allows the model to improve performance prediction accuracy by focusing on important code instruction. CAPSim can predict the execution time of unseen benchmarks at a significantly fast speed compared with an accurate O3 simulator built with gem5. Our evaluation on a commercial Intel Xeon CPU demonstrates that CAPSim achieves a 2.2 - 8.3x speedup compared to using gem5 built simulator, which is superior to the cutting-edge deep learning approach
