SAHM: State-Aware Heterogeneous Multicore for Single-Thread Performance
Shayne Wadle, Karthikeyan Sankaralingam
TL;DR
SAHM presents a state-aware heterogeneous multicore design that targets single-thread performance by exploiting fine-grained behavioral diversity in workloads. By defining 16 behavioral states from four microarchitectural metrics and migrating threads to cores specialized for current states, SAHM achieves substantial speedups while avoiding the area and power costs of a monolithic high-performance core. Empirical characterization on SPEC 2017 workloads shows distinct state coverage, nonuniform transitions, and long state intervals, justifying state-driven design. Realistic simulations with migration costs and inertia demonstrate robust performance gains, with mean speedups around the $15$–$17\%$ range and resilience to migration overhead, offering a practical path forward for single-thread performance enhancements.
Abstract
Improving single-thread performance remains a critical challenge in modern processor design, as conventional approaches such as deeper speculation, wider pipelines, and complex out-of-order execution face diminishing returns. This work introduces SAHM-State-Aware Heterogeneous Multicore-a novel architecture that targets performance gains by exploiting fine-grained, time-varying behavioral diversity in single-threaded workloads. Through empirical characterization of performance counter data, we define 16 distinct behavioral states representing different microarchitectural demands. Rather than over-provisioning a monolithic core with all optimizations, SAHM uses a set of specialized cores tailored to specific states and migrates threads at runtime based on detected behavior. This design enables composable microarchitectural enhancements without incurring prohibitive area, power, or complexity costs. We evaluate SAHM in both single-threaded and multiprogrammed scenarios, demonstrating its ability to maintain core utilization while improving overall performance through intelligent state-driven scheduling. Experimental results show opportunity for 17% speed up in realistic scenarios. These speed ups are robust against high-cost migration, decreasing by less than 1%. Overall, state-aware core specialization is a new path forward for enhancing single-thread performance.
