Table of Contents
Fetching ...

FinGraV: Methodology for Fine-Grain GPU Power Visibility and Insights

Varsha Singhania, Shaizeen Aga, Mohamed Assem Ibrahim

TL;DR

FinGraV tackles the challenge of obtaining fine-grain GPU power visibility for sub-millisecond AI kernels by introducing a methodology that combines execution-time binning, careful CPU-GPU time synchronization, and power-profile differentiation. Applied to the AMD MI300X, FinGraV yields time-resolved, component-aware power profiles for GEMM and communication kernels, uncovering how XCD, IOD, and HBM contribute under different algorithmic regimes and interleaving scenarios. The work provides concrete profiling guidance, demonstrates notable potential errors (up to ~80%) from misinterpreting power profiles, and offers design directions for power-aware GPU optimization and concurrent execution strategies. Overall, FinGraV enables accurate power measurements and deeper insights that can drive energy-efficient hardware and software improvements for ubiquitous GPU accelerators in AI workloads.

Abstract

Ubiquity of AI makes optimizing GPU power a priority as large GPU-based clusters are often employed to train and serve AI models. An important first step in optimizing GPU power consumption is high-fidelity and fine-grain power measurement of key AI computations on GPUs. To this end, we observe that as GPUs get more powerful, the resulting sub-millisecond to millisecond executions make fine-grain power analysis challenging. In this work, we first carefully identify the challenges in obtaining fine-grain GPU power profiles. To address these challenges, we devise FinGraV methodology where we employ execution time binning, careful CPU-GPU time synchronization, and power profile differentiation to collect fine-grain GPU power profiles across prominent AI computations and across a spectrum of scenarios. Using the said FinGraV power profiles, we provide both, guidance on accurate power measurement and, in-depth view of power consumption on state-of-the-art AMD Instinct MI300X. For the former, we highlight a methodology for power differentiation across executions. For the latter, we make several observations pertaining to GPU sub-component power consumption and GPU power proportionality across different scenarios. We believe that FinGraV unlocks both an accurate and a deeper view of power consumption of GPUs and opens up avenues for power optimization of these ubiquitous accelerators.

FinGraV: Methodology for Fine-Grain GPU Power Visibility and Insights

TL;DR

FinGraV tackles the challenge of obtaining fine-grain GPU power visibility for sub-millisecond AI kernels by introducing a methodology that combines execution-time binning, careful CPU-GPU time synchronization, and power-profile differentiation. Applied to the AMD MI300X, FinGraV yields time-resolved, component-aware power profiles for GEMM and communication kernels, uncovering how XCD, IOD, and HBM contribute under different algorithmic regimes and interleaving scenarios. The work provides concrete profiling guidance, demonstrates notable potential errors (up to ~80%) from misinterpreting power profiles, and offers design directions for power-aware GPU optimization and concurrent execution strategies. Overall, FinGraV enables accurate power measurements and deeper insights that can drive energy-efficient hardware and software improvements for ubiquitous GPU accelerators in AI workloads.

Abstract

Ubiquity of AI makes optimizing GPU power a priority as large GPU-based clusters are often employed to train and serve AI models. An important first step in optimizing GPU power consumption is high-fidelity and fine-grain power measurement of key AI computations on GPUs. To this end, we observe that as GPUs get more powerful, the resulting sub-millisecond to millisecond executions make fine-grain power analysis challenging. In this work, we first carefully identify the challenges in obtaining fine-grain GPU power profiles. To address these challenges, we devise FinGraV methodology where we employ execution time binning, careful CPU-GPU time synchronization, and power profile differentiation to collect fine-grain GPU power profiles across prominent AI computations and across a spectrum of scenarios. Using the said FinGraV power profiles, we provide both, guidance on accurate power measurement and, in-depth view of power consumption on state-of-the-art AMD Instinct MI300X. For the former, we highlight a methodology for power differentiation across executions. For the latter, we make several observations pertaining to GPU sub-component power consumption and GPU power proportionality across different scenarios. We believe that FinGraV unlocks both an accurate and a deeper view of power consumption of GPUs and opens up avenues for power optimization of these ubiquitous accelerators.

Paper Structure

This paper contains 19 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: FinGraV addresses challenges in fine-grain GPU power analysis.
  • Figure 2: An illustration (not to scale) of AMD Instinct™ MI300X, the GPU used in this work. The cross-sectional view (bottom) shows the stacking.
  • Figure 3: Challenges in doing fine-grain GPU power analysis.
  • Figure 4: FinGraV strategies to address challenges in fine-grain GPU power analysis.
  • Figure 5: FinGraV methodology evaluation for (a) benefit of CPU-GPU time sync, (b) effect of kernel execution time binning, and (c) resiliency to #runs using CB-4K-GEMM power profiles under different scenarios.
  • ...and 5 more figures