A Robust Power Model Training Framework for Cloud Native Runtime Energy Metric Exporter

Sunyanan Choochotkaew; Chen Wang; Huamin Chen; Tatsuhiro Chiba; Marcelo Amaral; Eun Kyung Lee; Tamar Eilam

A Robust Power Model Training Framework for Cloud Native Runtime Energy Metric Exporter

Sunyanan Choochotkaew, Chen Wang, Huamin Chen, Tatsuhiro Chiba, Marcelo Amaral, Eun Kyung Lee, Tamar Eilam

TL;DR

Problem:Estimating per-container power in multi-tenant clouds without platform access or online measurements. Approach: a cloud-native training pipeline integrated with Kepler trains a system power model $M_{sys}$ on aggregated usage $U$ to predict $P$, then isolates background power via $P_{U-x}$ and computes workload power $\,\Delta P_x = P - P_{U-x}$, selecting the best candidate by isolation goodness $\rho$ with threshold $\rho_{th}$ before training a per-container model $M$ on labels $\Delta P$. Contributions: dynamic background power isolation, a formal isolation goodness metric, cross-workload/cross-platform validation with online training, and the ability to train without platform data. Significance: enables non-RAPL container power estimation for unseen containers on unknown platforms, supporting energy-aware cloud management and carbon accounting.

Abstract

Estimating power consumption in modern Cloud environments is essential for carbon quantification toward green computing. Specifically, it is important to properly account for the power consumed by each of the running applications, which are packaged as containers. This paper examines multiple challenges associated with this goal. The first challenge is that multiple customers are sharing the same hardware platform (multi-tenancy), where information on the physical servers is mostly obscured. The second challenge is the overhead in power consumption that the Cloud platform control plane induces. This paper addresses these challenges and introduces a novel pipeline framework for power model training. This allows versatile power consumption approximation of individual containers on the basis of available performance counters and other metrics. The proposed model utilizes machine learning techniques to predict the power consumed by the control plane and associated processes, and uses it for isolating the power consumed by the user containers, from the server power consumption. To determine how well the prediction results in an isolation, we introduce a metric termed isolation goodness. Applying the proposed power model does not require online power measurements, nor does it need information on the physical servers, configuration, or information on other tenants sharing the same machine. The results of cross-workload, cross-platform experiments demonstrated the higher accuracy of the proposed model when predicting power consumption of unseen containers on unknown platforms, including on virtual machines.

A Robust Power Model Training Framework for Cloud Native Runtime Energy Metric Exporter

TL;DR

on aggregated usage

to predict

, then isolates background power via

and computes workload power

, selecting the best candidate by isolation goodness

with threshold

before training a per-container model

on labels

. Contributions: dynamic background power isolation, a formal isolation goodness metric, cross-workload/cross-platform validation with online training, and the ability to train without platform data. Significance: enables non-RAPL container power estimation for unseen containers on unknown platforms, supporting energy-aware cloud management and carbon accounting.

Abstract

Paper Structure (22 sections, 12 equations, 13 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 12 equations, 13 figures, 2 tables, 1 algorithm.

Introduction
Problem Definition
Related Works
Machine learning approach and features
Power isolation and model labeling
Power model training pipeline framework
Training pipeline with the proposed power isolation
Step 1: System power model training
Step $2$: Background power prediction
Step $3$: Power labeling
Step $4$: Container power model training
Step $5$: Online power model training
Cross validation
Evaluation Results
Comparison models
...and 7 more sections

Figures (13)

Figure 1: (a) Workload usage is not always correlated to (b) power consumption due to noisy background processes.
Figure 2: Dynamic power isolation for model training.
Figure 3: Snapshot of normalized Kepler metrics showing high correlation between resource usage and power consumption when running Coremark benchmark.
Figure 4: Correlation between resource usage from different metric producers and RAPL power for each benchmark.
Figure 5: Non-RAPL power modeling.
...and 8 more figures

A Robust Power Model Training Framework for Cloud Native Runtime Energy Metric Exporter

TL;DR

Abstract

A Robust Power Model Training Framework for Cloud Native Runtime Energy Metric Exporter

Authors

TL;DR

Abstract

Table of Contents

Figures (13)