Table of Contents
Fetching ...

Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild

Jiechen Zhao, Ran Shu, Katie Lim, Zewen Fan, Thomas Anderson, Mingyu Gao, Natalie Enright Jerger

TL;DR

The paper investigates performance isolation for accelerators in public clouds and demonstrates that existing multi-tenant isolation struggles under diverse intra-host traffic and datapath bottlenecks. It reframes accelerator I/O as network-like flows and proposes proactive traffic shaping via an accelerator-management stack to realize Accelerator-as-a-Service with end-to-end SLAs. The approach emphasizes per-tenant, hardware-assisted control (including a pull-based protocol and separate accelerator QPs) to achieve predictable throughput across complex I/O paths, underpinned by detailed empirical characterization. This work highlights the practical significance of traffic-aware acceleration management for cloud providers and suggests concrete directions for designing future I/O stacks and device interfaces to support scalable, isolated accelerator sharing.

Abstract

I/O devices in public clouds have integrated increasing numbers of hardware accelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such specialized compute (1) is not explicitly accessible to cloud users with performance guarantee, (2) cannot be leveraged simultaneously by both providers and users, unlike general-purpose compute (e.g., CPUs). Through ten observations, we present that the fundamental difficulty of democratizing accelerators is insufficient performance isolation support. The key obstacles to enforcing accelerator isolation are (1) too many unknown traffic patterns in public clouds and (2) too many possible contention sources in the datapath. In this work, instead of scheduling such complex traffic on-the-fly and augmenting isolation support on each system component, we propose to model traffic as network flows and proactively re-shape the traffic to avoid unpredictable contention. We discuss the implications of our findings on the design of future I/O management stacks and device interfaces.

Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild

TL;DR

The paper investigates performance isolation for accelerators in public clouds and demonstrates that existing multi-tenant isolation struggles under diverse intra-host traffic and datapath bottlenecks. It reframes accelerator I/O as network-like flows and proposes proactive traffic shaping via an accelerator-management stack to realize Accelerator-as-a-Service with end-to-end SLAs. The approach emphasizes per-tenant, hardware-assisted control (including a pull-based protocol and separate accelerator QPs) to achieve predictable throughput across complex I/O paths, underpinned by detailed empirical characterization. This work highlights the practical significance of traffic-aware acceleration management for cloud providers and suggests concrete directions for designing future I/O stacks and device interfaces to support scalable, isolated accelerator sharing.

Abstract

I/O devices in public clouds have integrated increasing numbers of hardware accelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such specialized compute (1) is not explicitly accessible to cloud users with performance guarantee, (2) cannot be leveraged simultaneously by both providers and users, unlike general-purpose compute (e.g., CPUs). Through ten observations, we present that the fundamental difficulty of democratizing accelerators is insufficient performance isolation support. The key obstacles to enforcing accelerator isolation are (1) too many unknown traffic patterns in public clouds and (2) too many possible contention sources in the datapath. In this work, instead of scheduling such complex traffic on-the-fly and augmenting isolation support on each system component, we propose to model traffic as network flows and proactively re-shape the traffic to avoid unpredictable contention. We discuss the implications of our findings on the design of future I/O management stacks and device interfaces.
Paper Structure (9 sections, 5 figures, 2 tables)

This paper contains 9 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: New services empowered by our proposal.
  • Figure 2: Accelerator profiling: compute throughput across message sizes.
  • Figure 3: Host-FPGA characterization results.
  • Figure 4: A system w/ accelerator management stack.
  • Figure 5: End-to-end scenarios. (a) A typical heterogeneous server with wild I/O contention, (b) function call mode, (c) inline mode, and (d) a complex use case.