Table of Contents
Fetching ...

SmartWatts: Self-Calibrating Software-Defined Power Meter for Containers

Guillaume Fieni, Romain Rouvoy, Lionel Seinturier

TL;DR

This paper tackles the challenge of fine-grained power profiling for software artifacts in data centers, where hardware meters lack container-level granularity and static software models struggle to generalize. It presents Smart-Watts, a self-calibrating software-defined power meter that uses online sequential learning to adapt CPU and DRAM power models at runtime, leveraging $p^{rapl}_{res} = p^{static}_{res} + p^{dyn}_{res}$ and $p^{dyn}_{res} = M^f_{res} \; E^f_{res}$ with frequency-aware ridge regression, and triggering recalibration when the tracking error $\epsilon_{res}$ exceeds a threshold. The implementation is modular and open-source, comprising a client-side sensor and a server-side power meter that estimate per-container power at up to $2$ Hz with average errors around a few percent, while isolating static power and selecting correlated HwPC events to maintain accuracy across heterogeneous workloads. The approach enables scalable, container-level energy accounting for distributed systems, demonstrated on Kubernetes with real workloads and energy Sankey analyses, thus supporting energy-aware scheduling and optimization without requiring special hardware or lengthy training phases.

Abstract

Fine-grained power monitoring of software activities becomes unavoidable to maximize the power usage efficiency of data centers. In particular, achieving an optimal scheduling of containers requires the deployment of software-defined power~meters to go beyond the granularity of hardware power monitoring sensors, such as Power Distribution Units (PDU) or Intel's Running Average Power Limit (RAPL), to deliver power estimations of activities at the granularity of software~containers. However, the definition of the underlying power models that estimate the power consumption remains a long and fragile process that is tightly coupled to the host machine. To overcome these limitations, this paper introduces SmartWatts: a lightweight power monitoring system that adopts online calibration to automatically adjust the CPU and DRAM power models in order to maximize the accuracy of runtime power estimations of containers. Unlike state-of-the-art techniques, SmartWatts does not require any a priori training phase or hardware equipment to configure the power models and can therefore be deployed on a wide range of machines including the latest power optimizations, at no cost.

SmartWatts: Self-Calibrating Software-Defined Power Meter for Containers

TL;DR

This paper tackles the challenge of fine-grained power profiling for software artifacts in data centers, where hardware meters lack container-level granularity and static software models struggle to generalize. It presents Smart-Watts, a self-calibrating software-defined power meter that uses online sequential learning to adapt CPU and DRAM power models at runtime, leveraging and with frequency-aware ridge regression, and triggering recalibration when the tracking error exceeds a threshold. The implementation is modular and open-source, comprising a client-side sensor and a server-side power meter that estimate per-container power at up to Hz with average errors around a few percent, while isolating static power and selecting correlated HwPC events to maintain accuracy across heterogeneous workloads. The approach enables scalable, container-level energy accounting for distributed systems, demonstrated on Kubernetes with real workloads and energy Sankey analyses, thus supporting energy-aware scheduling and optimization without requiring special hardware or lengthy training phases.

Abstract

Fine-grained power monitoring of software activities becomes unavoidable to maximize the power usage efficiency of data centers. In particular, achieving an optimal scheduling of containers requires the deployment of software-defined power~meters to go beyond the granularity of hardware power monitoring sensors, such as Power Distribution Units (PDU) or Intel's Running Average Power Limit (RAPL), to deliver power estimations of activities at the granularity of software~containers. However, the definition of the underlying power models that estimate the power consumption remains a long and fragile process that is tightly coupled to the host machine. To overcome these limitations, this paper introduces SmartWatts: a lightweight power monitoring system that adopts online calibration to automatically adjust the CPU and DRAM power models in order to maximize the accuracy of runtime power estimations of containers. Unlike state-of-the-art techniques, SmartWatts does not require any a priori training phase or hardware equipment to configure the power models and can therefore be deployed on a wide range of machines including the latest power optimizations, at no cost.

Paper Structure

This paper contains 20 sections, 7 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overview of Smart-Watts
  • Figure 2: Deployment of Smart-Watts
  • Figure 3: Evolution of the PKG & DRAM power consumption along time and containers
  • Figure 4: Illustrating the activity of the kernel when flooding UDP
  • Figure 5: Global & per-frequency error rate of the PKG power models
  • ...and 4 more figures