Branch Target Buffer Reverse Engineering on Arm
Junpeng Wan
TL;DR
The paper addresses the lack of public information about ARM BTB implementations by adapting reverse-engineering techniques previously developed for Intel x86 CPUs. It develops a gadget-based measurement framework leveraging ARMv8-A PMU counters to infer BTB properties on a Raspberry Pi 4B with a Cortex-A72 core. The authors report that the ARM BTB has a capacity of 4K entries, an 11-bit set index spanning bits 5–15, and 2-way associativity, validating these findings with misprediction analysis and heatmaps. This work provides a concrete ARM BTB model useful for compiler optimizations and hardware-security research, and it outlines a path for extending the methodology to other ARM designs such as Apple M-series.
Abstract
The Branch Target Buffer (BTB) plays a critical role in efficient CPU branch prediction. Understanding the design and implementation of the BTB provides valuable insights for both compiler design and the mitigation of hardware attacks such as Spectre. However, the proprietary nature of dominant CPUs, such as those from Intel, AMD, Apple, and Qualcomm, means that specific BTB implementation details are not publicly available. To address this limitation, several previous works have successfully reverse-engineered BTB information, including capacity and associativity, primarily targeting Intel's x86 processors. However, to our best knowledge, no research has attempted to reverse-engineer and expose the BTB implementation of ARM processors. This project aims to fill the gap by exploring the BTB of ARM processors. Specifically, we investigate whether existing reverse-engineering techniques developed for Intel BTB can be adapted for ARM. We reproduce the x86 methodology and identify specific PMU events for ARM to facilitate the reverse engineering process. In our experiment, we investigated our ARM CPU, i.e., the quad-core Cortex-A72 of the Raspberry Pi 4B. Our results show that the BTB capacity is 4K, the set index starts from the 5th bit and ends with the 15th bit of the PC (11 bits in total), and there are 2 ways in each set. The source code can be find at https://github.com/stefan1wan/BTB_ARM_RE.
