Table of Contents
Fetching ...

Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks

Oliver Larsson, Thijs Metsch, Cristian Klein, Erik Elmroth

TL;DR

Buoyancy is introduced, a novel abstraction for characterizing workload performance in multi-tenant systems that facilitates resource-aware and application-aware orchestration in a manner that is intuitive, extensible, and generalizable across heterogeneous platforms.

Abstract

Modern multi-tenant, hardware-heterogeneous computing environments pose significant challenges for effective workload orchestration. Simple heuristics for assessing workload performance, such as CPU utilization or application-level metrics, are often insufficient to capture the complex performance dynamics arising from resource contention and noisy-neighbor effects. In such environments, performance bottlenecks may emerge in any shared system resource, leading to unexpected and difficult-to-diagnose degradation. This paper introduces buoyancy, a novel abstraction for characterizing workload performance in multi-tenant systems. Unlike traditional approaches, buoyancy integrates application-level metrics with system-level insights of shared resource contention to provide a holistic view of performance dynamics. By explicitly capturing bottlenecks and headroom across multiple resources, buoyancy facilitates resource-aware and application-aware orchestration in a manner that is intuitive, extensible, and generalizable across heterogeneous platforms. We evaluate buoyancy using representative multi-tenant workloads to illustrate its ability to expose performance-limiting resource interactions. Buoyancy provides a 19.3% better indication of bottlenecks compared to traditional heuristics on average. We additionally show how buoyancy can act as a drop-in replacement for conventional performance metrics, enabling improved observability and more informed scheduling and optimization decisions.

Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks

TL;DR

Buoyancy is introduced, a novel abstraction for characterizing workload performance in multi-tenant systems that facilitates resource-aware and application-aware orchestration in a manner that is intuitive, extensible, and generalizable across heterogeneous platforms.

Abstract

Modern multi-tenant, hardware-heterogeneous computing environments pose significant challenges for effective workload orchestration. Simple heuristics for assessing workload performance, such as CPU utilization or application-level metrics, are often insufficient to capture the complex performance dynamics arising from resource contention and noisy-neighbor effects. In such environments, performance bottlenecks may emerge in any shared system resource, leading to unexpected and difficult-to-diagnose degradation. This paper introduces buoyancy, a novel abstraction for characterizing workload performance in multi-tenant systems. Unlike traditional approaches, buoyancy integrates application-level metrics with system-level insights of shared resource contention to provide a holistic view of performance dynamics. By explicitly capturing bottlenecks and headroom across multiple resources, buoyancy facilitates resource-aware and application-aware orchestration in a manner that is intuitive, extensible, and generalizable across heterogeneous platforms. We evaluate buoyancy using representative multi-tenant workloads to illustrate its ability to expose performance-limiting resource interactions. Buoyancy provides a 19.3% better indication of bottlenecks compared to traditional heuristics on average. We additionally show how buoyancy can act as a drop-in replacement for conventional performance metrics, enabling improved observability and more informed scheduling and optimization decisions.
Paper Structure (31 sections, 9 equations, 10 figures, 4 tables)

This paper contains 31 sections, 9 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Illustration of the intuition behind the buoyancy concept. Here, the application is represented by a ship floating on a body of water. The goal is to keep the ship afloat, which is analogous to keeping the workload within its performance limits. Additional load or interference may cause the ship to sink, while adding resources increases the buoyancy of the ship. A ship with a greater buoyancy has a more margin and can withstand larger increases in load or interference without sinking compared to a ship barely keeping above the waterline.
  • Figure 2: The normalized performance of different workloads as their allocation of CPU and LLC changes. All other resources allocations and workload parameters are kept constant. It is clear that some applications benefit more than others from the additional allocation in each domain.
  • Figure 3: A typical control-loop of an intent-driven system that manages resource allocation.
  • Figure 4: Illustration of the relationship between resources, resource scores, workload KPIs, and buoyancy scores.
  • Figure 5: The effects of resource scores and performance slack on the buoyancy score. The highlighted area shows a region where the buoyancy score $b \leq 0.1$, indicating that the workload is approaching a violation. In the first case, only a single resource score is considered. In the other cases, a second resource score is added and kept constant as indicated by the figure titles. As can be observed, even if the SLO slack is large, buoyancy can indicate an approaching bottleneck if resource scores are high.
  • ...and 5 more figures