CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers

Théophile Bastian; Hugo Pompougnac; Alban Dutilleul; Fabrice Rastello

CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers

Théophile Bastian, Hugo Pompougnac, Alban Dutilleul, Fabrice Rastello

TL;DR

This work tackles the challenge of statically predicting kernel throughput by exposing memory-carried dependencies as a key source of imprecision in code analyzers. It introduces CesASMe, a benchmarking framework that generates in-context, L1-resident microbenchmarks, lifts block-level predictions to kernel-level throughput, and compares them to hardware measurements. To address dependency blind spots, it proposes staticdeps, a heuristic that statically detects memory-carried dependencies across loop iterations and enhances analyzers such as uiCA, yielding significant accuracy improvements. The evaluation across thousands of microbenchmarks demonstrates that memory dependencies are a major bottleneck for static predictors, and that incorporating staticdeps into existing models can notably tighten prediction errors and better guide performance-oriented optimizations. Collectively, CesASMe and staticdeps provide a practical methodology and toolchain for robust evaluation and improvement of static throughput analyzers in real-world benchmarking contexts.

Abstract

A variety of code analyzers, such as IACA, uiCA, llvm-mca or Ithemal, strive to statically predict the throughput of a computation kernel. Each analyzer is based on its own simplified CPU model reasoning at the scale of a basic block. Facing this diversity, evaluating their strengths and weaknesses is important to guide both their usage and their enhancement. We present CesASMe, a fully-tooled solution to evaluate code analyzers on C-level benchmarks composed of a benchmark derivation procedure that feeds an evaluation harness. We conclude that memory-carried data dependencies are a major source of imprecision for these tools. We tackle this issue with staticdeps, a static analyzer extracting memory-carried data dependencies, including across loop iterations, from an assembly basic block. We integrate its output to uiCA, a state-of-the-art code analyzer, to evaluate staticdeps' impact on a code analyzer's precision through CesASMe.

CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers

TL;DR

Abstract

Paper Structure (28 sections, 2 theorems, 6 equations, 5 figures, 4 tables)

This paper contains 28 sections, 2 theorems, 6 equations, 5 figures, 4 tables.

Introduction
Contributions
Related works
Benchmarking harness
Predictions lifting.
Architecture of our benchmarking harness.
Generating microbenchmarks
Basic block extraction
Throughput predictions and measures
Prediction lifting and filtering
Soundness of CesASMe's methodology
Understanding BHive's results
Imprecise analysis
Failed analysis
Static extraction of memory-carried dependencies
...and 13 more sections

Key Result

Theorem 1

A dependency between two instructions that are separated by at least $R$ others $\mu$OPs can be ignored.

Figures (5)

Figure 1: Our analysis and measurement environment.
Figure 2: Relative error distribution wrt.perf
Figure 3: Statistical distribution of relative errors
Figure 4: Statistical distribution of relative errors, with and without pruning latency bound through memory-carried dependencies rows (llvm-mca outliers trimmed)
Figure 5: Statistical distribution of relative errors of uiCA, with and without staticdeps hints, with and without pruning latency bound through memory-carried dependencies rows

Theorems & Definitions (3)

Definition 1: Distance between instructions
Theorem 1: Long distance dependencies
Lemma 1: Distance of in-flight $\mu$OPs

CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers

TL;DR

Abstract

CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (3)