Towards Fast Setup and High Throughput of GPU Serverless Computing

Han Zhao; Weihao Cui; Quan Chen; Shulai Zhang; Zijun Li; Jingwen Leng; Chao Li; Deze Zeng; Minyi Guo

Towards Fast Setup and High Throughput of GPU Serverless Computing

Han Zhao, Weihao Cui, Quan Chen, Shulai Zhang, Zijun Li, Jingwen Leng, Chao Li, Deze Zeng, Minyi Guo

TL;DR

This paper tackles the long setup times and limited throughput of GPU-enabled serverless platforms with FixedGSL by introducing SAGE, a GPU serverless framework that decouples data preparation from GPU context creation and enables memory sharing. It presents two core innovations: a parallelized function setup mechanism (via a unified memory daemon and a taxon shim) and a sharing-based memory management scheme (including read-only memory sharing and a multi-stage resource exit). The design is complemented by a practical programming model (C++ runtime, SageInit/SageRun, and SageLoadToGPU/SageDumpToDB APIs) and a detailed evaluation on NVIDIA A100 GPUs across DL and scientific workloads, showing up to 11.3× reduction in function duration and a 1.22× improvement in function density, among other gains. Collectively, these contributions demonstrate that fine-grained memory management and concurrent preparation can dramatically increase GPU serverless efficiency, enabling higher function density and lower operational costs for bursty workloads.

Abstract

Integrating GPUs into serverless computing platforms is crucial for improving efficiency. However, existing solutions for GPU-enabled serverless computing platforms face two significant problems due to coarse-grained GPU management: long setup time and low function throughput. To address these issues, we propose SAGE, a GPU serverless framework with fast setup and high throughput. First, based on the data knowability of GPU function ahead of actual execution, SAGE first devises the parallelized function setup mechanism, which parallelizes the data preparation and context creation. In this way, SAGE achieves fast setup of GPU function invocations.Second, SAGE further proposes the sharing-based memory management mechanism, which shares the read-only memory and context memory across multiple invocations of the same function. The memory sharing mechanism avoids repeated data preparation and then unnecessary data-loading contention. As a consequence, the function throughput could be improved. Our experimental results show that SAGE reduces function duration by 11.3X and improves function density by 1.22X compared to the state-of-the-art serverless platform.

Towards Fast Setup and High Throughput of GPU Serverless Computing

TL;DR

Abstract

Towards Fast Setup and High Throughput of GPU Serverless Computing

Authors

TL;DR

Abstract

Table of Contents

Figures (17)