FinGraV: Methodology for Fine-Grain GPU Power Visibility and Insights
Varsha Singhania, Shaizeen Aga, Mohamed Assem Ibrahim
TL;DR
FinGraV tackles the challenge of obtaining fine-grain GPU power visibility for sub-millisecond AI kernels by introducing a methodology that combines execution-time binning, careful CPU-GPU time synchronization, and power-profile differentiation. Applied to the AMD MI300X, FinGraV yields time-resolved, component-aware power profiles for GEMM and communication kernels, uncovering how XCD, IOD, and HBM contribute under different algorithmic regimes and interleaving scenarios. The work provides concrete profiling guidance, demonstrates notable potential errors (up to ~80%) from misinterpreting power profiles, and offers design directions for power-aware GPU optimization and concurrent execution strategies. Overall, FinGraV enables accurate power measurements and deeper insights that can drive energy-efficient hardware and software improvements for ubiquitous GPU accelerators in AI workloads.
Abstract
Ubiquity of AI makes optimizing GPU power a priority as large GPU-based clusters are often employed to train and serve AI models. An important first step in optimizing GPU power consumption is high-fidelity and fine-grain power measurement of key AI computations on GPUs. To this end, we observe that as GPUs get more powerful, the resulting sub-millisecond to millisecond executions make fine-grain power analysis challenging. In this work, we first carefully identify the challenges in obtaining fine-grain GPU power profiles. To address these challenges, we devise FinGraV methodology where we employ execution time binning, careful CPU-GPU time synchronization, and power profile differentiation to collect fine-grain GPU power profiles across prominent AI computations and across a spectrum of scenarios. Using the said FinGraV power profiles, we provide both, guidance on accurate power measurement and, in-depth view of power consumption on state-of-the-art AMD Instinct MI300X. For the former, we highlight a methodology for power differentiation across executions. For the latter, we make several observations pertaining to GPU sub-component power consumption and GPU power proportionality across different scenarios. We believe that FinGraV unlocks both an accurate and a deeper view of power consumption of GPUs and opens up avenues for power optimization of these ubiquitous accelerators.
