DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information
Qiang Wang, Laiyi Li, Weile Luo, Yijia Zhang, Bingqiang Wang
TL;DR
The paper tackles GPU energy efficiency under DVFS by fusing dynamic runtime signals and static code features into a single, low-overhead optimization framework. It introduces a parameterized energy-delay model that captures DVFS effects and uses ML to predict model parameters from DCGM metrics and PTX features, enabling online configuration without heavy profiling. The approach yields around 19% energy savings on Volta GPUs with at most 5% performance loss and achieves strong prediction accuracy (MAPEs in the low single digits). This fusion-based methodology offers a practical path for energy-aware GPU management in HPC and AI workloads, reducing profiling overhead while delivering tangible power savings.
Abstract
Increased reliance on graphics processing units (GPUs) for high-intensity computing tasks raises challenges regarding energy consumption. To address this issue, dynamic voltage and frequency scaling (DVFS) has emerged as a promising technique for conserving energy while maintaining the quality of service (QoS) of GPU applications. However, existing solutions using DVFS are hindered by inefficiency or inaccuracy as they depend either on dynamic or static information respectively, which prevents them from being adopted to practical power management schemes. To this end, we propose a novel energy efficiency optimizer, called DSO, to explore a light weight solution that leverages both dynamic and static information to model and optimize the GPU energy efficiency. DSO firstly proposes a novel theoretical energy efficiency model which reflects the DVFS roofline phenomenon and considers the tradeoff between performance and energy. Then it applies machine learning techniques to predict the parameters of the above model with both GPU kernel runtime metrics and static code features. Experiments on modern DVFS-enabled GPUs indicate that DSO can enhance energy efficiency by 19% whilst maintaining performance within a 5% loss margin.
