Taming and Controlling Performance and Energy Trade-offs Automatically in Network Applications
Han Dong, Yara Awad, Sanjay Arora, Orran Krieger, Jonathan Appavoo
TL;DR
This work tackles energy efficiency for latency-sensitive cloud applications by treating a server as a black box and exploiting stability in offered load. It demonstrates that external control of interrupt coalescing (ITR-delay) and CPU frequency (DVFS) can discover energy-efficient sweet spots that meet tail-latency SLA targets, without software changes. Through an extensive energy study on Linux and a specialized library OS (EbbRT), it shows up to ~60% energy reductions and up to ~2× energy efficiency gains, with OS specialization offering additional benefits. The authors propose a Bayesian-optimization-based controller that models performance-energy trade-offs and adapts to varying workloads and hardware, achieving broad generality across apps (e.g., Memcached, Tailbench workloads) and platforms. The results have practical impact for data centers seeking energy proportionality by leveraging simple hardware controls and black-box search to automate energy-aware SLA enforcement.
Abstract
In this paper, we demonstrate that a server running a single latency-sensitive application can be treated as a black box to reduce energy consumption while meeting an SLA target. We find that when the mean offered load is stable, one can find the "sweet spot" settings in packet batching (via interrupt coalescing) and controlling the processing rate (DVFS) that represents optimal trade-offs in the interactions of the software stack and hardware with the arrival rate and composition of requests currently being served. Trying a few combinations of settings on the live system, an example Bayesian optimizer can find settings that reduce the energy consumption to meet a desired tail latency for the current load. This research demonstrates that: 1) without software changes, dramatic energy savings (up to 60%) can be achieved across diverse hardware systems if one controls batching and processing rate, 2) specialized research OSes that have been developed for performance can achieve more than 2x better energy efficiency than general-purpose OSes, and 3) a controller, agnostic to the application and system, can easily find energy-efficient settings for the offered load that meets SLA objectives.
