Table of Contents
Fetching ...

The Energy Cost of Execution-Idle in GPU Clusters

Yiran Lei, Jared Fernandez, Vasilis Kypriotis, Dimitrios Skarlatos, Emma Strubell, Justine Sherry, Daniel Vosler

Abstract

GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle. Using per-second telemetry from a large academic AI cluster, we characterize execution-idle as a recurring low-activity yet high-power state in real deployments. Across diverse workloads and multiple GPU generations, it accounts for 19.7% of in-execution time and 10.7% of energy. This suggests a need to both reduce the cost of execution-idle and reduce exposure to it. We therefore build two prototypes: one uses automatic downscaling during execution-idle, and the other uses load imbalance to reduce exposure, both with performance trade-offs. These findings suggest that future energy-efficient GPU systems should treat execution-idle as a first-class operating state.

The Energy Cost of Execution-Idle in GPU Clusters

Abstract

GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle. Using per-second telemetry from a large academic AI cluster, we characterize execution-idle as a recurring low-activity yet high-power state in real deployments. Across diverse workloads and multiple GPU generations, it accounts for 19.7% of in-execution time and 10.7% of energy. This suggests a need to both reduce the cost of execution-idle and reduce exposure to it. We therefore build two prototypes: one uses automatic downscaling during execution-idle, and the other uses load imbalance to reduce exposure, both with performance trade-offs. These findings suggest that future energy-efficient GPU systems should treat execution-idle as a first-class operating state.

Paper Structure

This paper contains 25 sections, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: CPU power falls with idle time, but GPU power remains elevated even when a loaded program is fully idle.
  • Figure 2: Time-aligned power, SM and DRAM utilization, and normalized frequency for a job on an L40S GPU, illustrating the execution-idle state.
  • Figure 3: Cluster-scale GPU energy accounting over the study window. The left panel compares observed GPU energy while the right panel decomposes job-attributed GPU time and energy by regime.
  • Figure 4: Power in the execution-idle state remains substantially above deep idle across all GPU models in our study.
  • Figure 5: Execution-idle time and energy fractions across academic workload categories and replayed industry serving traces.
  • ...and 7 more figures