Table of Contents
Fetching ...

Improving Memory Dependence Prediction with Static Analysis

Luke Panayi, Rohan Gandhi, Jim Whittaker, Vassilios Chouliaras, Martin Berger, Paul Kelly

TL;DR

This work addresses memory dependence prediction (MDP) in Out-of-Order CPUs by proposing a static-analysis-based approach to pre-label loads as Predict No Dependency (PND), allowing them to bypass MDP lookups. Implemented as an LLVM IR pass, PND labels are communicated to an AArch64-enabled CPU via minimally invasive opcode changes, enabling zero-cost interaction with hardware while preserving correctness at commit. Simulation in Gem5 over a Spec2017 subset shows an average reduction of MDP lookups by $13\%$ and CPI gains up to $0.7\%$, with some benchmarks reaching larger improvements, especially when MDP tables are smaller. The results indicate that static analysis can provide meaningful, near-zero-cost performance benefits and motivate further exploration of IR-based labeling and more advanced MDP algorithms for future CPU designs.

Abstract

This paper explores the potential of communicating information gained by static analysis from compilers to Out-of-Order (OoO) machines, focusing on the memory dependence predictor (MDP). The MDP enables loads to issue without all in-flight store addresses being known, with minimal memory order violations. We use LLVM to find loads with no dependencies and label them via their opcode. These labelled loads skip making lookups into the MDP, improving prediction accuracy by reducing false dependencies. We communicate this information in a minimally intrusive way, i.e.~without introducing additional hardware costs or instruction bandwidth, providing these improvements without any additional overhead in the CPU. We find that in select cases in Spec2017, a significant number of load instructions can skip interacting with the MDP and lead to a performance gain. These results point to greater possibilities for static analysis as a source of near zero cost performance gains in future CPU designs.

Improving Memory Dependence Prediction with Static Analysis

TL;DR

This work addresses memory dependence prediction (MDP) in Out-of-Order CPUs by proposing a static-analysis-based approach to pre-label loads as Predict No Dependency (PND), allowing them to bypass MDP lookups. Implemented as an LLVM IR pass, PND labels are communicated to an AArch64-enabled CPU via minimally invasive opcode changes, enabling zero-cost interaction with hardware while preserving correctness at commit. Simulation in Gem5 over a Spec2017 subset shows an average reduction of MDP lookups by and CPI gains up to , with some benchmarks reaching larger improvements, especially when MDP tables are smaller. The results indicate that static analysis can provide meaningful, near-zero-cost performance benefits and motivate further exploration of IR-based labeling and more advanced MDP algorithms for future CPU designs.

Abstract

This paper explores the potential of communicating information gained by static analysis from compilers to Out-of-Order (OoO) machines, focusing on the memory dependence predictor (MDP). The MDP enables loads to issue without all in-flight store addresses being known, with minimal memory order violations. We use LLVM to find loads with no dependencies and label them via their opcode. These labelled loads skip making lookups into the MDP, improving prediction accuracy by reducing false dependencies. We communicate this information in a minimally intrusive way, i.e.~without introducing additional hardware costs or instruction bandwidth, providing these improvements without any additional overhead in the CPU. We find that in select cases in Spec2017, a significant number of load instructions can skip interacting with the MDP and lead to a performance gain. These results point to greater possibilities for static analysis as a source of near zero cost performance gains in future CPU designs.
Paper Structure (24 sections, 3 figures)

This paper contains 24 sections, 3 figures.

Figures (3)

  • Figure 1: Components involved in issuing load instructions in OoO execution. Section 1 of the figure represents the process without speculative execution, and contains the instruction queue to track register dependencies, the SQ for loads to find forwarding cases, and the LQ for stores to verify proper ordering of loads. Section 2 of the figure introduces speculative execution, and contains the MDP which is PC indexed on load dispatch and returns the PC of stores the load is predicted to be dependent on.
  • Figure 2: MDP look-ups per kilo-instruction between labelled and unlabelled benchmarks. Lower is better, values are near equal on all three CPU size configurations.
  • Figure 3: CPI percent change between labelled and unlabelled binaries. Changes less than 0.5% in magnitude are unlabelled. Lower is better.