Table of Contents
Fetching ...

Explainable Port Mapping Inference with Sparse Performance Counters for AMD's Zen Architectures

Fabian Ritter, Sebastian Hack

TL;DR

This work modify the port mapping inference algorithm of the widely used uops.info project to not rely on Intel's performance counters and investigates in how far AMD's processors comply with this model and where unexpected performance characteristics prevent an accurate port mapping.

Abstract

Performance models are instrumental for optimizing performance-sensitive code. When modeling the use of functional units of out-of-order x86-64 CPUs, data availability varies by the manufacturer: Instruction-to-port mappings for Intel's processors are available, whereas information for AMD's designs are lacking. The reason for this disparity is that standard techniques to infer exact port mappings require hardware performance counters that AMD does not provide. In this work, we modify the port mapping inference algorithm of the widely used uops.info project to not rely on Intel's performance counters. The modifications are based on a formal port mapping model with a counter-example-guided algorithm powered by an SMT solver. We investigate in how far AMD's processors comply with this model and where unexpected performance characteristics prevent an accurate port mapping. Our results provide valuable insights for creators of CPU performance models as well as for software developers who want to achieve peak performance on recent AMD CPUs.

Explainable Port Mapping Inference with Sparse Performance Counters for AMD's Zen Architectures

TL;DR

This work modify the port mapping inference algorithm of the widely used uops.info project to not rely on Intel's performance counters and investigates in how far AMD's processors comply with this model and where unexpected performance characteristics prevent an accurate port mapping.

Abstract

Performance models are instrumental for optimizing performance-sensitive code. When modeling the use of functional units of out-of-order x86-64 CPUs, data availability varies by the manufacturer: Instruction-to-port mappings for Intel's processors are available, whereas information for AMD's designs are lacking. The reason for this disparity is that standard techniques to infer exact port mappings require hardware performance counters that AMD does not provide. In this work, we modify the port mapping inference algorithm of the widely used uops.info project to not rely on Intel's performance counters. The modifications are based on a formal port mapping model with a counter-example-guided algorithm powered by an SMT solver. We investigate in how far AMD's processors comply with this model and where unexpected performance characteristics prevent an accurate port mapping. Our results provide valuable insights for creators of CPU performance models as well as for software developers who want to achieve peak performance on recent AMD CPUs.
Paper Structure (35 sections, 13 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 35 sections, 13 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: Simplified overview of a modern processor design (based on Figure 2-8 in the Intel Software Optimization Manual intel-opt-manual).
  • Figure 2: Example port mapping (a) and optimal µop distribution for [mul, mul, fma] (b). The processor executes two $u_1$ µops (for the fma instruction) and three $u_2$ µops (one for the fma instruction and one for each mul instruction) for this instruction sequence. Only port $p_2$ can handle $u_2$ while $u_1$ could be executed on either port.
  • Figure 3: Possible steady-state distributions of µops per port in benchmarks of fma with (a) 3 mul and (b) 6 add blocking instructions, using the port mapping from \ref{['fig:ex_uopsinfo_algo:mapping']}.
  • Figure 4: Port mappings that satisfy $\bigl\{ ([i_A], 1.0), ([i_B], 1.0) \bigr\}$.
  • Figure 5: IPC prediction accuracy for Zen+ in metrics (a) and as heatmaps of predicted vs. measured IPC per model (b-d).