ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

Xinning Hui; Yuanchao Xu; Zhishan Guo; Xipeng Shen

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

Xinning Hui, Yuanchao Xu, Zhishan Guo, Xipeng Shen

TL;DR

ESG tackles the challenge of scheduling DNN inferences on serverless platforms with sharable GPUs by treating GPUs as a first-class resource and using an optimality-guided two-step approach. ESG_1Q formulates scheduling as a path-finding problem and employs A*-search with dual-blade pruning, while dominator-based SLO distribution scales the method to large DAG-like workflows. ESG_Dispatch maps configurations to Invokers with locality-aware decisions to maximize data locality and warm starts. Empirical results show ESG delivering 61-80% higher SLO hit rates and 47-187% cost savings over prior work, with overhead under 10 ms and robustness to workload variations. This work meaningfully enables cost-effective, scalable GPU sharing for real-time ML inference on serverless platforms.

Abstract

Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some important factors, leaving some large performance potential locked. This paper presents ESG, a new scheduling algorithm that directly addresses the difficulties. ESG treats sharable GPU as a first-order factor in scheduling. It employs an optimality-guided adaptive method by combining A*-search and a novel dual-blade pruning to dramatically prune the scheduling space without compromising the quality. It further introduces a novel method, dominator-based SLO distribution, to ensure the scalability of the scheduler. The results show that ESG can significantly improve the SLO hit rates 61%-80% while saving 47%-187% costs over prior work.

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

TL;DR

Abstract

Paper Structure (23 sections, 12 figures, 4 tables, 1 algorithm)

This paper contains 23 sections, 12 figures, 4 tables, 1 algorithm.

Introduction
Background
Solution: ESG Scheduling Algorithm
Overview
Two-step Design.
Resource Model and Task Model
ESG_1Q Algorithm
A*-Search with Dual-Bladed Pruning in ESG_1Q
Dominator-based SLO Distribution for Scalability
ESG_Dispatch: Mapping to Worker Nodes
Methodology for Evaluation
Applications
Comparison Counterparts
Evaluation
End-to-End Performance
...and 8 more sections

Figures (12)

Figure 1: OpenWhisk architecture. Controller is where scheduling happens.
Figure 2: The app-func-wise (AFW) job queues of two example ML-based applications, and the ESG algorithm workflow in handling one job queue.
Figure 3: (a) Top: Example of the configuration space of a three-function application and two configuration paths in the space. Bottom: the time and per-job resource costs of the two paths. (b) Basic ESG_1Q algorithm in pseudo-code.
Figure 4: Illustration of dominator-based SLO distribution.
Figure 5: Job arrival intervals used in the evaluation part for different workload settings.
...and 7 more figures

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

TL;DR

Abstract

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

Authors

TL;DR

Abstract

Table of Contents

Figures (12)