Table of Contents
Fetching ...

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

Eishi Arima, Minjoon Kang, Issa Saba, Josef Weidendorfer, Carsten Trinitis, Martin Schulz

TL;DR

This work tackles the challenge of underutilization and high power consumption in CPU-GPU HPC systems by proposing a co-scheduling framework that jointly optimizes hardware-level GPU partitioning (via MIG), job allocations, and power budgets under power caps. The authors introduce an offline/online workflow that trains a linear-regression performance model using hardware counters and then solves two optimization problems: maximizing throughput under fairness with a fixed power cap, and maximizing throughput per unit power with fairness. Their model captures both scalability and interference effects through $RPerf_{Appi}(S,P)=C(S,P)\cdot H(F_{Appi})+\sum_{j\neq i}D(S,P)\cdot J(F_{Appj})$, with coefficients learned for each $(S,P)$ configuration. Evaluation on an NVIDIA A100 with MIG demonstrates accurate predictions (average errors ~$9.7\%$ for throughput and $14.5\%$ for fairness) and near-optimal throughput/energy efficiency across diverse workloads, validating the method’s practical potential. The work advances power-aware, hardware-partitioned co-scheduling and paves the way for integration with cluster schedulers like SLURM to optimize resource use in real HPC deployments.

Abstract

CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically cannot fully utilize all resources within a node/chip, co-scheduling (or co-locating) multiple programs with complementary resource requirements is a promising solution. Meanwhile, as power consumption has become the first-class design constraint for HPC systems, such co-scheduling techniques should be well-tailored for power-constrained environments. To this end, the industry recently started supporting hardware-level resource partitioning features on modern GPUs for realizing efficient co-scheduling, which can operate with existing power capping features. For example, NVidia's MIG (Multi-Instance GPU) partitions one single GPU into multiple instances at the granularity of a GPC (Graphics Processing Cluster). In this paper, we explicitly target the combination of hardware-level GPU partitioning features and power capping for power-constrained HPC systems. We provide a systematic methodology to optimize the combination of chip partitioning, job allocations, as well as power capping based on our scalability/interference modeling while taking a variety of aspects into account, such as compute/memory intensity and utilization in heterogeneous computational resources (e.g., Tensor Cores). The experimental result indicates that our approach is successful in selecting a near optimal combination across multiple different workloads.

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

TL;DR

This work tackles the challenge of underutilization and high power consumption in CPU-GPU HPC systems by proposing a co-scheduling framework that jointly optimizes hardware-level GPU partitioning (via MIG), job allocations, and power budgets under power caps. The authors introduce an offline/online workflow that trains a linear-regression performance model using hardware counters and then solves two optimization problems: maximizing throughput under fairness with a fixed power cap, and maximizing throughput per unit power with fairness. Their model captures both scalability and interference effects through , with coefficients learned for each configuration. Evaluation on an NVIDIA A100 with MIG demonstrates accurate predictions (average errors ~ for throughput and for fairness) and near-optimal throughput/energy efficiency across diverse workloads, validating the method’s practical potential. The work advances power-aware, hardware-partitioned co-scheduling and paves the way for integration with cluster schedulers like SLURM to optimize resource use in real HPC deployments.

Abstract

CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically cannot fully utilize all resources within a node/chip, co-scheduling (or co-locating) multiple programs with complementary resource requirements is a promising solution. Meanwhile, as power consumption has become the first-class design constraint for HPC systems, such co-scheduling techniques should be well-tailored for power-constrained environments. To this end, the industry recently started supporting hardware-level resource partitioning features on modern GPUs for realizing efficient co-scheduling, which can operate with existing power capping features. For example, NVidia's MIG (Multi-Instance GPU) partitions one single GPU into multiple instances at the granularity of a GPC (Graphics Processing Cluster). In this paper, we explicitly target the combination of hardware-level GPU partitioning features and power capping for power-constrained HPC systems. We provide a systematic methodology to optimize the combination of chip partitioning, job allocations, as well as power capping based on our scalability/interference modeling while taking a variety of aspects into account, such as compute/memory intensity and utilization in heterogeneous computational resources (e.g., Tensor Cores). The experimental result indicates that our approach is successful in selecting a near optimal combination across multiple different workloads.
Paper Structure (26 sections, 5 equations, 11 figures, 6 tables)

This paper contains 26 sections, 5 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Our Assuming HPC System and Our Scope
  • Figure 2: MIG with Private LLC/HBM Option
  • Figure 3: MIG with Shared LLC/HBM Option
  • Figure 4: Scalability Observations for Different Partitioning Options across Different Benchmarks (Power Cap: 250[W])
  • Figure 6: Impact of Resource Partitioning/Allocations on Co-scheduling Throughput (Power Cap: 250[W])
  • ...and 6 more figures