Table of Contents
Fetching ...

ShadowBinding: Realizing Effective Microarchitectures for In-Core Secure Speculation Schemes

Amund Bergland Kvalsvik, Magnus Själander

TL;DR

ShadowBinding provides RTL-based microarchitectural designs for two in-core secure speculation schemes, NDA and STT, and demonstrates that in-core security incurs substantial, architecture-dependent costs. It shows that STA-Rename introduces a single-cycle YRoT dependency chain, while STT-Issue delays tainting to the issue stage to mitigate this, and that NDA offers a simpler, more timing-friendly alternative. RTL experiments on the RISC-V BOOM and comparisons with gem5 reveal IPC losses of approximately $18.1 ext{ extpercent}$ (STT-Rename), $15.5 ext{ extpercent}$ (STT-Issue), and $26.4 ext{ extpercent}$ (NDA), with overall performance slowdowns up to $34.5 ext{ extpercent}$ for STT-Rename and $26.8 ext{ extpercent}$ (STT-Issue) and $21.5 ext{ extpercent}$ (NDA) on the highest-performance core; extrapolation suggests even larger costs for leading processors. The findings challenge prior simulator-based estimates and highlight the need for careful microarchitectural design and evaluation when adopting secure speculation schemes. The work argues that NDA may currently offer the best practical balance among in-core defenses and emphasizes the importance of detailed hardware evaluation for industry adoption and sustainability considerations.

Abstract

Secure speculation schemes have shown great promise in the war against speculative side-channel attacks, and will be a key building block for developing secure, high-performance architectures moving forward. As the field matures, the need for rigorous microarchitectures, and corresponding performance and cost analysis, become critical for evaluating secure schemes and for enabling their future adoption. In ShadowBinding, we present effective microarchitectures for two state-of-the-art secure schemes, uncovering and mitigating fundamental microarchitectural limitations within the analyzed schemes, and provide important design characteristics. We uncover that Speculative Taint Tracking's (STT's) rename-based taint computation must be completed in a single cycle, creating an expensive dependency chain which greatly limits performance for wider processor cores. We also introduce a novel michroarchitectural approach for STT, named STT-Issue, which, by delaying the taint computation to the issue stage, eliminates the dependency chain, achieving better instructions per cycle (IPC), timing, area, and performance results. Through a comprehensive evaluation of our STT and Non-Speculative Data Access (NDA) microarchitectural designs on the RISC-V Berkeley Out-of-Order Machine, we find that the IPC impact of in-core secure schemes is higher than previously estimated, close to 20% for the highest performance core. With insights into timing from our RTL evaluation, the performance loss, created by the combined impact of IPC and timing, becomes even greater, at 35%, 27%, and 22% for STT-Rename, STT-Issue, and NDA, respectively. If these trends were to hold for leading processor core designs, the performance impact would be well over 30%, even for the best-performing scheme.

ShadowBinding: Realizing Effective Microarchitectures for In-Core Secure Speculation Schemes

TL;DR

ShadowBinding provides RTL-based microarchitectural designs for two in-core secure speculation schemes, NDA and STT, and demonstrates that in-core security incurs substantial, architecture-dependent costs. It shows that STA-Rename introduces a single-cycle YRoT dependency chain, while STT-Issue delays tainting to the issue stage to mitigate this, and that NDA offers a simpler, more timing-friendly alternative. RTL experiments on the RISC-V BOOM and comparisons with gem5 reveal IPC losses of approximately (STT-Rename), (STT-Issue), and (NDA), with overall performance slowdowns up to for STT-Rename and (STT-Issue) and (NDA) on the highest-performance core; extrapolation suggests even larger costs for leading processors. The findings challenge prior simulator-based estimates and highlight the need for careful microarchitectural design and evaluation when adopting secure speculation schemes. The work argues that NDA may currently offer the best practical balance among in-core defenses and emphasizes the importance of detailed hardware evaluation for industry adoption and sustainability considerations.

Abstract

Secure speculation schemes have shown great promise in the war against speculative side-channel attacks, and will be a key building block for developing secure, high-performance architectures moving forward. As the field matures, the need for rigorous microarchitectures, and corresponding performance and cost analysis, become critical for evaluating secure schemes and for enabling their future adoption. In ShadowBinding, we present effective microarchitectures for two state-of-the-art secure schemes, uncovering and mitigating fundamental microarchitectural limitations within the analyzed schemes, and provide important design characteristics. We uncover that Speculative Taint Tracking's (STT's) rename-based taint computation must be completed in a single cycle, creating an expensive dependency chain which greatly limits performance for wider processor cores. We also introduce a novel michroarchitectural approach for STT, named STT-Issue, which, by delaying the taint computation to the issue stage, eliminates the dependency chain, achieving better instructions per cycle (IPC), timing, area, and performance results. Through a comprehensive evaluation of our STT and Non-Speculative Data Access (NDA) microarchitectural designs on the RISC-V Berkeley Out-of-Order Machine, we find that the IPC impact of in-core secure schemes is higher than previously estimated, close to 20% for the highest performance core. With insights into timing from our RTL evaluation, the performance loss, created by the combined impact of IPC and timing, becomes even greater, at 35%, 27%, and 22% for STT-Rename, STT-Issue, and NDA, respectively. If these trends were to hold for leading processor core designs, the performance impact would be well over 30%, even for the best-performing scheme.

Paper Structure

This paper contains 33 sections, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Normalized performance (IPC x Timing) of evaluated secure speculation schemes, with trend line. Data points are placed based on achieved baseline IPC for different configurations.
  • Figure 2: A register renaming example with three instructions. Architectural source registers are translated to physical registers through reading the register alias table (RAT), and architectural destination registers are assigned physical registers from the free list. Any errors from same-cycle dependencies, such as ① and ②, are corrected one cycle later. More importantly, the RAT is independently updated with the assigned physical registers from the free list.
  • Figure 3: Microarchitecture of YRoT computation for a three instruction-wide rename stage. The critical path is highlighted in red. Dotted lines indicate a translation from index to YRoT. The stippled lines delineate the different instructions.
  • Figure 4: Microarchitecture for STT-Issue. Added structures to support tainting are highlighted in blue. Note that Wakeup and Select are not affected by STT-Issue. Critical path and YRoT depends only on a single instruction unlike for STT-Rename.
  • Figure 5: Impact of NDA on broadcast and writeback. Note that NDA requires data write be able to select a different physical register than the broadcast.
  • ...and 5 more figures