Table of Contents
Fetching ...

Taming and Controlling Performance and Energy Trade-offs Automatically in Network Applications

Han Dong, Yara Awad, Sanjay Arora, Orran Krieger, Jonathan Appavoo

TL;DR

This work tackles energy efficiency for latency-sensitive cloud applications by treating a server as a black box and exploiting stability in offered load. It demonstrates that external control of interrupt coalescing (ITR-delay) and CPU frequency (DVFS) can discover energy-efficient sweet spots that meet tail-latency SLA targets, without software changes. Through an extensive energy study on Linux and a specialized library OS (EbbRT), it shows up to ~60% energy reductions and up to ~2× energy efficiency gains, with OS specialization offering additional benefits. The authors propose a Bayesian-optimization-based controller that models performance-energy trade-offs and adapts to varying workloads and hardware, achieving broad generality across apps (e.g., Memcached, Tailbench workloads) and platforms. The results have practical impact for data centers seeking energy proportionality by leveraging simple hardware controls and black-box search to automate energy-aware SLA enforcement.

Abstract

In this paper, we demonstrate that a server running a single latency-sensitive application can be treated as a black box to reduce energy consumption while meeting an SLA target. We find that when the mean offered load is stable, one can find the "sweet spot" settings in packet batching (via interrupt coalescing) and controlling the processing rate (DVFS) that represents optimal trade-offs in the interactions of the software stack and hardware with the arrival rate and composition of requests currently being served. Trying a few combinations of settings on the live system, an example Bayesian optimizer can find settings that reduce the energy consumption to meet a desired tail latency for the current load. This research demonstrates that: 1) without software changes, dramatic energy savings (up to 60%) can be achieved across diverse hardware systems if one controls batching and processing rate, 2) specialized research OSes that have been developed for performance can achieve more than 2x better energy efficiency than general-purpose OSes, and 3) a controller, agnostic to the application and system, can easily find energy-efficient settings for the offered load that meets SLA objectives.

Taming and Controlling Performance and Energy Trade-offs Automatically in Network Applications

TL;DR

This work tackles energy efficiency for latency-sensitive cloud applications by treating a server as a black box and exploiting stability in offered load. It demonstrates that external control of interrupt coalescing (ITR-delay) and CPU frequency (DVFS) can discover energy-efficient sweet spots that meet tail-latency SLA targets, without software changes. Through an extensive energy study on Linux and a specialized library OS (EbbRT), it shows up to ~60% energy reductions and up to ~2× energy efficiency gains, with OS specialization offering additional benefits. The authors propose a Bayesian-optimization-based controller that models performance-energy trade-offs and adapts to varying workloads and hardware, achieving broad generality across apps (e.g., Memcached, Tailbench workloads) and platforms. The results have practical impact for data centers seeking energy proportionality by leveraging simple hardware controls and black-box search to automate energy-aware SLA enforcement.

Abstract

In this paper, we demonstrate that a server running a single latency-sensitive application can be treated as a black box to reduce energy consumption while meeting an SLA target. We find that when the mean offered load is stable, one can find the "sweet spot" settings in packet batching (via interrupt coalescing) and controlling the processing rate (DVFS) that represents optimal trade-offs in the interactions of the software stack and hardware with the arrival rate and composition of requests currently being served. Trying a few combinations of settings on the live system, an example Bayesian optimizer can find settings that reduce the energy consumption to meet a desired tail latency for the current load. This research demonstrates that: 1) without software changes, dramatic energy savings (up to 60%) can be achieved across diverse hardware systems if one controls batching and processing rate, 2) specialized research OSes that have been developed for performance can achieve more than 2x better energy efficiency than general-purpose OSes, and 3) a controller, agnostic to the application and system, can easily find energy-efficient settings for the offered load that meets SLA objectives.

Paper Structure

This paper contains 35 sections, 5 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: NetPIPE and NodJS Webserver performance and energy results for different message sizes. Every datapoint is the result of a single experimental run with a unique ITR, DVFS combination while Linux-default has dynamic ITR-delay, DVFS algorithms enabled instead. LibOS refers to EbbRT. The X-axis is a measure of performance (lower is better) and Y-axis shows the total energy consumed. For Linux-tuned (or Linux-static) and LibOS-tuned (or EbbRT-static), the labeled (ITR-delay, DVFS) pair are experimental values that resulted in the lowest energy use. LibOS-poll shows EbbRT with a run-to-completion polling loop at different processor frequencies (shown as change in gradient colors). Note: The X and Y scales are different to show the structure of collected data.
  • Figure 2: ITR-delay values set by Linux's dynamic ITR-delay algorithm. This is captured during a live run of NetPIPE at 64 KB message size.
  • Figure 3: Memcached: Each point represents a single experimental run. The *-static data points use a unique (ITR-delay, DVFS) pair. We only illustrate data that lie on the Pareto-optimal curve. The X-axis shows performance measurement (lower is better) and the Y-axis shows total energy consumed. Linux results for 1000K and 1500K QPS loads are not shown as Linux could not support them without violating SLA.
  • Figure 4: Memcached: ITR-delay impact on instruction count ($1e11$). Not drawn to scale to show structure in data.
  • Figure 5: Illustrates the change in energy and 99% latency as different ITR-delay, DVFS pairs are explored for Linux memcached.
  • ...and 9 more figures