Table of Contents
Fetching ...

Learning Process Energy Profiles from Node-Level Power Data

Jonathan Bader, Julius Irion, Jannis Kappel, Joel Witzke, Niklas Fomin, Diellza Sherifi, Odej Kao

TL;DR

The paper tackles the lack of fine-grained per-process energy data in data centers and proposes a hardware-agnostic framework that learns per-process energy by regressing process-level resource metrics gathered via eBPF and perf against node-level energy measurements from a smart meter. A convex, $\ell_1$-regularized regression with a nonnegative static baseline, solved via CVXPY, yields per-process energy weights and a baseline energy, enabling hardware-agnostic energy attribution. Evaluation on commodity hardware with Phoronix workloads shows the model explains a substantial fraction of energy usage ($R^2 \approx 0.59$) with a mean absolute error of about $17.67$ joules per interval, roughly $3.5\%$ of the mean energy, indicating good temporal fidelity and interpretable attributions. The approach offers practical impact by enabling energy-aware scheduling and DVFS-informed decisions without relying on hardware-specific power meters.

Abstract

The growing demand for data center capacity, driven by the growth of high-performance computing, cloud computing, and especially artificial intelligence, has led to a sharp increase in data center energy consumption. To improve energy efficiency, gaining process-level insights into energy consumption is essential. While node-level energy consumption data can be directly measured with hardware such as power meters, existing mechanisms for estimating per-process energy usage, such as Intel RAPL, are limited to specific hardware and provide only coarse-grained, domain-level measurements. Our proposed approach models per-process energy profiles by leveraging fine-grained process-level resource metrics collected via eBPF and perf, which are synchronized with node-level energy measurements obtained from an attached power distribution unit. By statistically learning the relationship between process-level resource usage and node-level energy consumption through a regression-based model, our approach enables more fine-grained per-process energy predictions.

Learning Process Energy Profiles from Node-Level Power Data

TL;DR

The paper tackles the lack of fine-grained per-process energy data in data centers and proposes a hardware-agnostic framework that learns per-process energy by regressing process-level resource metrics gathered via eBPF and perf against node-level energy measurements from a smart meter. A convex, -regularized regression with a nonnegative static baseline, solved via CVXPY, yields per-process energy weights and a baseline energy, enabling hardware-agnostic energy attribution. Evaluation on commodity hardware with Phoronix workloads shows the model explains a substantial fraction of energy usage () with a mean absolute error of about joules per interval, roughly of the mean energy, indicating good temporal fidelity and interpretable attributions. The approach offers practical impact by enabling energy-aware scheduling and DVFS-informed decisions without relying on hardware-specific power meters.

Abstract

The growing demand for data center capacity, driven by the growth of high-performance computing, cloud computing, and especially artificial intelligence, has led to a sharp increase in data center energy consumption. To improve energy efficiency, gaining process-level insights into energy consumption is essential. While node-level energy consumption data can be directly measured with hardware such as power meters, existing mechanisms for estimating per-process energy usage, such as Intel RAPL, are limited to specific hardware and provide only coarse-grained, domain-level measurements. Our proposed approach models per-process energy profiles by leveraging fine-grained process-level resource metrics collected via eBPF and perf, which are synchronized with node-level energy measurements obtained from an attached power distribution unit. By statistically learning the relationship between process-level resource usage and node-level energy consumption through a regression-based model, our approach enables more fine-grained per-process energy predictions.

Paper Structure

This paper contains 9 sections, 2 equations, 4 figures.

Figures (4)

  • Figure 1: Overall system architecture and the interaction between monitoring component, aggregator component, and estimator component.
  • Figure 2: Spearman correlation of monitored features with interval energy
  • Figure 3: Estimated overall energy versus real overall energy
  • Figure 4: Per-process energy estimated by the model