Table of Contents
Fetching ...

SAHM: State-Aware Heterogeneous Multicore for Single-Thread Performance

Shayne Wadle, Karthikeyan Sankaralingam

TL;DR

SAHM presents a state-aware heterogeneous multicore design that targets single-thread performance by exploiting fine-grained behavioral diversity in workloads. By defining 16 behavioral states from four microarchitectural metrics and migrating threads to cores specialized for current states, SAHM achieves substantial speedups while avoiding the area and power costs of a monolithic high-performance core. Empirical characterization on SPEC 2017 workloads shows distinct state coverage, nonuniform transitions, and long state intervals, justifying state-driven design. Realistic simulations with migration costs and inertia demonstrate robust performance gains, with mean speedups around the $15$–$17\%$ range and resilience to migration overhead, offering a practical path forward for single-thread performance enhancements.

Abstract

Improving single-thread performance remains a critical challenge in modern processor design, as conventional approaches such as deeper speculation, wider pipelines, and complex out-of-order execution face diminishing returns. This work introduces SAHM-State-Aware Heterogeneous Multicore-a novel architecture that targets performance gains by exploiting fine-grained, time-varying behavioral diversity in single-threaded workloads. Through empirical characterization of performance counter data, we define 16 distinct behavioral states representing different microarchitectural demands. Rather than over-provisioning a monolithic core with all optimizations, SAHM uses a set of specialized cores tailored to specific states and migrates threads at runtime based on detected behavior. This design enables composable microarchitectural enhancements without incurring prohibitive area, power, or complexity costs. We evaluate SAHM in both single-threaded and multiprogrammed scenarios, demonstrating its ability to maintain core utilization while improving overall performance through intelligent state-driven scheduling. Experimental results show opportunity for 17% speed up in realistic scenarios. These speed ups are robust against high-cost migration, decreasing by less than 1%. Overall, state-aware core specialization is a new path forward for enhancing single-thread performance.

SAHM: State-Aware Heterogeneous Multicore for Single-Thread Performance

TL;DR

SAHM presents a state-aware heterogeneous multicore design that targets single-thread performance by exploiting fine-grained behavioral diversity in workloads. By defining 16 behavioral states from four microarchitectural metrics and migrating threads to cores specialized for current states, SAHM achieves substantial speedups while avoiding the area and power costs of a monolithic high-performance core. Empirical characterization on SPEC 2017 workloads shows distinct state coverage, nonuniform transitions, and long state intervals, justifying state-driven design. Realistic simulations with migration costs and inertia demonstrate robust performance gains, with mean speedups around the range and resilience to migration overhead, offering a practical path forward for single-thread performance enhancements.

Abstract

Improving single-thread performance remains a critical challenge in modern processor design, as conventional approaches such as deeper speculation, wider pipelines, and complex out-of-order execution face diminishing returns. This work introduces SAHM-State-Aware Heterogeneous Multicore-a novel architecture that targets performance gains by exploiting fine-grained, time-varying behavioral diversity in single-threaded workloads. Through empirical characterization of performance counter data, we define 16 distinct behavioral states representing different microarchitectural demands. Rather than over-provisioning a monolithic core with all optimizations, SAHM uses a set of specialized cores tailored to specific states and migrates threads at runtime based on detected behavior. This design enables composable microarchitectural enhancements without incurring prohibitive area, power, or complexity costs. We evaluate SAHM in both single-threaded and multiprogrammed scenarios, demonstrating its ability to maintain core utilization while improving overall performance through intelligent state-driven scheduling. Experimental results show opportunity for 17% speed up in realistic scenarios. These speed ups are robust against high-cost migration, decreasing by less than 1%. Overall, state-aware core specialization is a new path forward for enhancing single-thread performance.

Paper Structure

This paper contains 57 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: The canonical configuration of a SAHM system. Each core except the baseline is specialized for the listed component. An example program migration pattern is included.
  • Figure 2: The portion of an application that the application is in a state averaged across the SPEC 2017 benchmarks. The intiutive cut-offs captures the most diversity.
  • Figure 3: The portion of an average application spent in a state with intuitive cut offs. The majority of states are visited.
  • Figure 4: The percent of runtime spent in each state by application. The breadth of behaviors stands out.
  • Figure 5: The portion of the total number of transitions on average. The diagonal has been removed and white cells indicate transitions that were not seen in our study. There is no overwhelming outliers that should be designed for.
  • ...and 6 more figures