Profiling and Modeling of Power Characteristics of Leadership-Scale HPC System Workloads
Ahmad Maroof Karimi, Naw Safrin Sattar, Woong Shin, Feiyi Wang
TL;DR
The paper tackles the challenge of profiling and modeling power consumption for leadership-scale HPC workloads to enable energy-aware operation. It proposes a low-latency ML pipeline that converts high-resolution power telemetry into fixed-length feature representations via a 186-feature extractor and a GAN-based latent space, followed by DBSCAN clustering into 119 contextualized classes across ~60K Summit jobs. An open-set CAC-based classifier handles unseen patterns, and an iterative workflow with periodic retraining maintains accuracy as workloads evolve. The approach yields actionable insights into the power landscape across science domains and demonstrates robust performance in both closed-set and open-set settings, offering a path toward energy-aware resource management in exascale systems.
Abstract
In the exascale era in which application behavior has large power & energy footprints, per-application job-level awareness of such impression is crucial in taking steps towards achieving efficiency goals beyond performance, such as energy efficiency, and sustainability. To achieve these goals, we have developed a novel low-latency job power profiling machine learning pipeline that can group job-level power profiles based on their shapes as they complete. This pipeline leverages a comprehensive feature extraction and clustering pipeline powered by a generative adversarial network (GAN) model to handle the feature-rich time series of job-level power measurements. The output is then used to train a classification model that can predict whether an incoming job power profile is similar to a known group of profiles or is completely new. With extensive evaluations, we demonstrate the effectiveness of each component in our pipeline. Also, we provide a preliminary analysis of the resulting clusters that depict the power profile landscape of the Summit supercomputer from more than 60K jobs sampled from the year 2021.
