Table of Contents
Fetching ...

Dissecting Conditional Branch Predictors of Apple Firestorm and Qualcomm Oryon for Software Optimization and Architectural Analysis

Jiajie Chen, Peng Qu, Youhui Zhang

TL;DR

A more general branch prediction reverse engineering pipeline is designed that can additionally recover the conditional branch predictors of Apple Firestorm and Qualcomm Oryon microarchitectures, and subsequently build accurate CBP models, resulting in up to 14% MPKI reduction and 7% performance improvement in representative applications.

Abstract

Branch predictor (BP) is a critical component of modern processors, and its accurate modeling is essential for compilers and applications. However, processor vendors have disclosed limited details about their BP implementations. Recent advancements in reverse engineering the BP of general-purpose processors have enabled the creation of more accurate BP models. Nonetheless, we have identified critical deficiencies in the existing methods. For instance, they impose strong assumptions on the branch history update function and the index/tag functions of key BP components, limiting their applicability to a broader range of processors, including those from Apple and Qualcomm. In this paper, we design a more general branch prediction reverse engineering pipeline that can additionally recover the conditional branch predictors (CBPs) of Apple Firestorm and Qualcomm Oryon microarchitectures, and subsequently build accurate CBP models. Leveraging these models, we uncover two previously undisclosed effects that impair branch prediction accuracy and propose related solutions, resulting in up to 14% MPKI reduction and 7% performance improvement in representative applications. Furthermore, we conduct a comprehensive comparison of the known Intel/Apple/Qualcomm CBPs using a unified standalone branch predictor simulator, which facilitates a deeper understanding of CBP behavior.

Dissecting Conditional Branch Predictors of Apple Firestorm and Qualcomm Oryon for Software Optimization and Architectural Analysis

TL;DR

A more general branch prediction reverse engineering pipeline is designed that can additionally recover the conditional branch predictors of Apple Firestorm and Qualcomm Oryon microarchitectures, and subsequently build accurate CBP models, resulting in up to 14% MPKI reduction and 7% performance improvement in representative applications.

Abstract

Branch predictor (BP) is a critical component of modern processors, and its accurate modeling is essential for compilers and applications. However, processor vendors have disclosed limited details about their BP implementations. Recent advancements in reverse engineering the BP of general-purpose processors have enabled the creation of more accurate BP models. Nonetheless, we have identified critical deficiencies in the existing methods. For instance, they impose strong assumptions on the branch history update function and the index/tag functions of key BP components, limiting their applicability to a broader range of processors, including those from Apple and Qualcomm. In this paper, we design a more general branch prediction reverse engineering pipeline that can additionally recover the conditional branch predictors (CBPs) of Apple Firestorm and Qualcomm Oryon microarchitectures, and subsequently build accurate CBP models. Leveraging these models, we uncover two previously undisclosed effects that impair branch prediction accuracy and propose related solutions, resulting in up to 14% MPKI reduction and 7% performance improvement in representative applications. Furthermore, we conduct a comprehensive comparison of the known Intel/Apple/Qualcomm CBPs using a unified standalone branch predictor simulator, which facilitates a deeper understanding of CBP behavior.

Paper Structure

This paper contains 32 sections, 2 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Structure of TAGE conditional branch predictor
  • Figure 2: The procedure of CBP and reverse engineering it
  • Figure 3: Reverse engineering pipeline: our improvements over the existing pipeline
  • Figure 4: Misprediction rate of the last conditional branch due to different number of branches before it
  • Figure 5: Misprediction rate due to different number of dummy branches and branch address toggle bit
  • ...and 4 more figures