Regulating CPU Temperature With Thermal-Aware Scheduling Using a Reduced Order Learning Thermal Model
Anthony Dowling, Lin Jiang, Ming-Cheng Cheng, Yu Liu
TL;DR
This work addresses thermal challenges in real-time multi-core CPUs by introducing POD-TAS, a thermal-aware scheduling method that leverages a dynamic Proper Orthogonal Decomposition (POD) thermal model to predict transient temperatures. Unlike steady-state approaches, POD-TAS uses a pair of temperature thresholds to idle and resume cores, guided by per-core state transitions, enabling accurate, high-resolution temperature control without relying on DVFS. The authors develop a simulation-based evaluation framework using gem5, McPAT, and FEniCS, and demonstrate that POD-TAS substantially reduces peak temperatures and spatial thermal variance compared with RT-TAS, achieving near-FEM accuracy with a 30-mode POD model. The findings suggest that dynamic, physics-informed reduced-order modeling can significantly improve TAS performance and reliability, with potential extensions to GPU co-scheduling and data-center-scale deployments.
Abstract
Modern real-time systems utilize considerable amounts of power while executing computation-intensive tasks. The execution of these tasks leads to significant power dissipation and heating of the device. It therefore results in severe thermal issues like temperature escalation, high thermal gradients, and excessive hot spot formation, which may result in degrading chip performance, accelerating device aging, and premature failure. Thermal-Aware Scheduling (TAS) enables optimization of thermal dissipation to maintain a safe thermal state. In this work, we implement a new TAS algorithm, POD-TAS, which manages the thermal behavior of a multi-core CPU based on a defined set of states and their transitions. We compare the performances of a dynamic RC thermal circuit simulator (HotSpot) and a reduced order Proper Orthogonal Decomposition (POD)-based thermal model and we select the latter for use in our POD-TAS algorithm. We implement a novel simulation-based evaluation methodology to compare TAS algorithms. This methodology is used to evaluate the performance of the proposed POD-TAS algorithm. Additionally, we compare the performance of a state of the art TAS algorithm, RT-TAS, to our proposed POD-TAS algorithm. Furthermore, we utilize the COMBS benchmark suite to provide CPU workloads for task scheduling. Our experimental results on a multi-core processor using a set of 4 benchmarks demonstrate that the proposed POD-TAS method can improve thermal performance by decreasing the peak thermal variance by 53.0% and the peak chip temperature of 29.01%. Using a set of 8 benchmarks, the comparison of the two algorithms shows a decrease of 29.57% in the peak spatial variance of the chip temperature and 26.26% in the peak chip temperature. We also identify several potential future research directions.
