Table of Contents
Fetching ...

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

Xinning Hui, Yuanchao Xu, Zhishan Guo, Xipeng Shen

TL;DR

ESG tackles the challenge of scheduling DNN inferences on serverless platforms with sharable GPUs by treating GPUs as a first-class resource and using an optimality-guided two-step approach. ESG_1Q formulates scheduling as a path-finding problem and employs A*-search with dual-blade pruning, while dominator-based SLO distribution scales the method to large DAG-like workflows. ESG_Dispatch maps configurations to Invokers with locality-aware decisions to maximize data locality and warm starts. Empirical results show ESG delivering 61-80% higher SLO hit rates and 47-187% cost savings over prior work, with overhead under 10 ms and robustness to workload variations. This work meaningfully enables cost-effective, scalable GPU sharing for real-time ML inference on serverless platforms.

Abstract

Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some important factors, leaving some large performance potential locked. This paper presents ESG, a new scheduling algorithm that directly addresses the difficulties. ESG treats sharable GPU as a first-order factor in scheduling. It employs an optimality-guided adaptive method by combining A*-search and a novel dual-blade pruning to dramatically prune the scheduling space without compromising the quality. It further introduces a novel method, dominator-based SLO distribution, to ensure the scalability of the scheduler. The results show that ESG can significantly improve the SLO hit rates 61%-80% while saving 47%-187% costs over prior work.

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

TL;DR

ESG tackles the challenge of scheduling DNN inferences on serverless platforms with sharable GPUs by treating GPUs as a first-class resource and using an optimality-guided two-step approach. ESG_1Q formulates scheduling as a path-finding problem and employs A*-search with dual-blade pruning, while dominator-based SLO distribution scales the method to large DAG-like workflows. ESG_Dispatch maps configurations to Invokers with locality-aware decisions to maximize data locality and warm starts. Empirical results show ESG delivering 61-80% higher SLO hit rates and 47-187% cost savings over prior work, with overhead under 10 ms and robustness to workload variations. This work meaningfully enables cost-effective, scalable GPU sharing for real-time ML inference on serverless platforms.

Abstract

Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some important factors, leaving some large performance potential locked. This paper presents ESG, a new scheduling algorithm that directly addresses the difficulties. ESG treats sharable GPU as a first-order factor in scheduling. It employs an optimality-guided adaptive method by combining A*-search and a novel dual-blade pruning to dramatically prune the scheduling space without compromising the quality. It further introduces a novel method, dominator-based SLO distribution, to ensure the scalability of the scheduler. The results show that ESG can significantly improve the SLO hit rates 61%-80% while saving 47%-187% costs over prior work.
Paper Structure (23 sections, 12 figures, 4 tables, 1 algorithm)

This paper contains 23 sections, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: OpenWhisk architecture. Controller is where scheduling happens.
  • Figure 2: The app-func-wise (AFW) job queues of two example ML-based applications, and the ESG algorithm workflow in handling one job queue.
  • Figure 3: (a) Top: Example of the configuration space of a three-function application and two configuration paths in the space. Bottom: the time and per-job resource costs of the two paths. (b) Basic ESG_1Q algorithm in pseudo-code.
  • Figure 4: Illustration of dominator-based SLO distribution.
  • Figure 5: Job arrival intervals used in the evaluation part for different workload settings.
  • ...and 7 more figures