Table of Contents
Fetching ...

SemanticBBV: A Semantic Signature for Cross-Program Knowledge Reuse in Microarchitecture Simulation

Zhenguo Liu, Chengao Shi, Chen Ding, Jiang Xu

TL;DR

The paper tackles the limitation of cross-program knowledge reuse in microarchitecture simulation by replacing order-dependent BBVs with semantic Basic Block Embeddings and an order-invariant aggregation. It introduces SemanticBBV, a two-stage framework combining a RWKV-based semantic encoder for BBEs and a Set Transformer for the final signature, trained with both structural (triplet) and performance-oriented (CPI regression) objectives. The approach achieves single-program accuracy comparable to BBVs and enables cross-program performance estimation with 14 representative points achieving 86.3% CPI estimation accuracy across 10 SPEC CPU benchmarks, corresponding to a 7143x simulation speedup. It also demonstrates adaptability to new microarchitectures via lightweight fine-tuning and identifies opportunities for further improvement by incorporating richer hardware performance counters.

Abstract

For decades, sampling-based techniques have been the de facto standard for accelerating microarchitecture simulation, with the Basic Block Vector (BBV) serving as the cornerstone program representation. Yet, the BBV's fundamental limitations: order-dependent IDs that prevent cross-program knowledge reuse and a lack of semantic content predictive of hardware performance have left a massive potential for optimization untapped. To address these gaps, we introduce SemanticBBV, a novel, two-stage framework that generates robust, performance-aware signatures for cross-program simulation reuse. First, a lightweight RWKV-based semantic encoder transforms assembly basic blocks into rich Basic Block Embeddings (BBEs), capturing deep functional semantics. Second, an order-invariant Set Transformer aggregates these BBEs, weighted by execution frequency, into a final signature. Crucially, this stage is co-trained with a dual objective: a triplet loss for signature distinctiveness and a Cycles Per Instruction (CPI) regression task, directly imbuing the signature with performance sensitivity. Our evaluation demonstrates that SemanticBBV not only matches traditional BBVs in single-program accuracy but also enables unprecedented cross-program analysis. By simulating just 14 universal program points, we estimated the performance of ten SPEC CPU benchmarks with 86.3% average accuracy, achieving a 7143x simulation speedup. Furthermore, the signature shows strong adaptability to new microarchitectures with minimal fine-tuning.

SemanticBBV: A Semantic Signature for Cross-Program Knowledge Reuse in Microarchitecture Simulation

TL;DR

The paper tackles the limitation of cross-program knowledge reuse in microarchitecture simulation by replacing order-dependent BBVs with semantic Basic Block Embeddings and an order-invariant aggregation. It introduces SemanticBBV, a two-stage framework combining a RWKV-based semantic encoder for BBEs and a Set Transformer for the final signature, trained with both structural (triplet) and performance-oriented (CPI regression) objectives. The approach achieves single-program accuracy comparable to BBVs and enables cross-program performance estimation with 14 representative points achieving 86.3% CPI estimation accuracy across 10 SPEC CPU benchmarks, corresponding to a 7143x simulation speedup. It also demonstrates adaptability to new microarchitectures via lightweight fine-tuning and identifies opportunities for further improvement by incorporating richer hardware performance counters.

Abstract

For decades, sampling-based techniques have been the de facto standard for accelerating microarchitecture simulation, with the Basic Block Vector (BBV) serving as the cornerstone program representation. Yet, the BBV's fundamental limitations: order-dependent IDs that prevent cross-program knowledge reuse and a lack of semantic content predictive of hardware performance have left a massive potential for optimization untapped. To address these gaps, we introduce SemanticBBV, a novel, two-stage framework that generates robust, performance-aware signatures for cross-program simulation reuse. First, a lightweight RWKV-based semantic encoder transforms assembly basic blocks into rich Basic Block Embeddings (BBEs), capturing deep functional semantics. Second, an order-invariant Set Transformer aggregates these BBEs, weighted by execution frequency, into a final signature. Crucially, this stage is co-trained with a dual objective: a triplet loss for signature distinctiveness and a Cycles Per Instruction (CPI) regression task, directly imbuing the signature with performance sensitivity. Our evaluation demonstrates that SemanticBBV not only matches traditional BBVs in single-program accuracy but also enables unprecedented cross-program analysis. By simulating just 14 universal program points, we estimated the performance of ten SPEC CPU benchmarks with 86.3% average accuracy, achieving a 7143x simulation speedup. Furthermore, the signature shows strong adaptability to new microarchitectures with minimal fine-tuning.

Paper Structure

This paper contains 22 sections, 5 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Comparison of Traditional BBV and SemanticBBV.
  • Figure 2: Overview of the SemanticBBV framework. The weighting mechanism introduced in Stage 2 is detailed in Figure \ref{['fig:bbv_vs_semanticBBV']}.
  • Figure 3: Pre-training Tasks
  • Figure 4: Comparison of simulation accuracy difference between SemanticBBV and traditional BBV on selected SPEC CPU 2017 floating-point benchmarks.
  • Figure 5: The workflow for cross-program performance estimation.
  • ...and 3 more figures