Table of Contents
Fetching ...

Arcus: SLO Management for Accelerators in the Cloud with Traffic Shaping

Jiechen Zhao, Ran Shu, Katie Lim, Zewen Fan, Thomas Anderson, Mingyu Gao, Natalie Enright Jerger

TL;DR

An SLO-aware protocol coupled with an offloaded interface on an architecture that supports precise and scalable traffic shaping and guarantees accelerator SLO for various circumstances, with up to 45% tail latency reduction and less than 1% throughput variance are presented.

Abstract

Cloud servers use accelerators for common tasks (e.g., encryption, compression, hashing) to improve CPU/GPU efficiency and overall performance. However, users' Service-level Objectives (SLOs) can be violated due to accelerator-related contention. The root cause is that existing solutions for accelerators only focus on isolation or fair allocation of compute and memory resources; they overlook the contention for communication-related resources. Specifically, three communication-induced challenges drive us to re-think the problem: (1) Accelerator traffic patterns are diverse, hard to predict, and mixed across users, (2) communication-related components lack effective low-level isolation mechanism to configure, and (3) computational heterogeneity of accelerators lead to unique relationships between the traffic mixture and the corresponding accelerator performance. The focus of this work is meeting SLOs in accelerator-rich systems. We present \design{}, treating accelerator SLO management as traffic management with proactive traffic shaping. We develop an SLO-aware protocol coupled with an offloaded interface on an architecture that supports precise and scalable traffic shaping. We guarantee accelerator SLO for various circumstances, with up to 45% tail latency reduction and less than 1% throughput variance.

Arcus: SLO Management for Accelerators in the Cloud with Traffic Shaping

TL;DR

An SLO-aware protocol coupled with an offloaded interface on an architecture that supports precise and scalable traffic shaping and guarantees accelerator SLO for various circumstances, with up to 45% tail latency reduction and less than 1% throughput variance are presented.

Abstract

Cloud servers use accelerators for common tasks (e.g., encryption, compression, hashing) to improve CPU/GPU efficiency and overall performance. However, users' Service-level Objectives (SLOs) can be violated due to accelerator-related contention. The root cause is that existing solutions for accelerators only focus on isolation or fair allocation of compute and memory resources; they overlook the contention for communication-related resources. Specifically, three communication-induced challenges drive us to re-think the problem: (1) Accelerator traffic patterns are diverse, hard to predict, and mixed across users, (2) communication-related components lack effective low-level isolation mechanism to configure, and (3) computational heterogeneity of accelerators lead to unique relationships between the traffic mixture and the corresponding accelerator performance. The focus of this work is meeting SLOs in accelerator-rich systems. We present \design{}, treating accelerator SLO management as traffic management with proactive traffic shaping. We develop an SLO-aware protocol coupled with an offloaded interface on an architecture that supports precise and scalable traffic shaping. We guarantee accelerator SLO for various circumstances, with up to 45% tail latency reduction and less than 1% throughput variance.

Paper Structure

This paper contains 22 sections, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: Accelerator management architectures.
  • Figure 2: Possible paths to invoke an accelerator.
  • Figure 3: Representative results of case studies in Table \ref{['table:profiling-cases']}.
  • Figure 4: Workflow of an Arcus-enabled system. CP: capacity planning, AC: admission control.
  • Figure 5: Arcus dataplane protocol for two modes of accelerator invocation. The funnel icon means traffic shaping.
  • ...and 6 more figures