SCOPE: Performance Testing for Serverless Computing

Jinfeng Wen; Zhenpeng Chen; Jianshu Zhao; Federica Sarro; Haodi Ping; Ying Zhang; Shangguang Wang; Xuanzhe Liu

SCOPE: Performance Testing for Serverless Computing

Jinfeng Wen, Zhenpeng Chen, Jianshu Zhao, Federica Sarro, Haodi Ping, Ying Zhang, Shangguang Wang, Xuanzhe Liu

TL;DR

This work tackles the challenge of obtaining accurate end-to-end latency measurements for serverless functions amid highly dynamic cloud environments. It introduces SCOPE, a serverless-specific performance-testing approach that uses a dual stopping criterion—accuracy and consistency—based on non-parametric confidence intervals for percentile latency (25th, 50th, 75th) to decide when enough repetitions have been collected. SCOPE offers three CI implementations (order-statistic, basic bootstrap, block bootstrap) and demonstrates markedly higher accuracy ($> $97%) and reliability than state-of-the-art baselines (PT4Cloud, Metior, CONFIRM) across 65 functions from a public dataset. The results indicate SCOPE’s strong generalizability across cold/warm starts, diverse triggers, and bursty workloads, with practical implications for reducing testing overhead and enabling per-function tailoring of performance measurements. The work also provides a public repository for replication and outlines future directions to broaden function coverage and integrate SCOPE into developer tooling.

Abstract

Serverless computing is a popular cloud computing paradigm that has found widespread adoption across various online workloads. It allows software engineers to develop cloud applications as a set of functions (called serverless functions). However, accurately measuring the performance (i.e., end-to-end response latency) of serverless functions is challenging due to the highly dynamic nature of the environment in which they run. To tackle this problem, a potential solution is to apply checks of performance testing techniques to determine how many repetitions of a given serverless function across a range of inputs are needed to cater to the performance fluctuation. However, the available literature lacks performance testing approaches designed explicitly for serverless computing. In this paper, we propose SCOPE, the first serverless computing-oriented performance testing approach. SCOPE takes into account the unique performance characteristics of serverless functions, such as their short execution durations and on-demand triggering. As such, SCOPE is designed as a fine-grained analysis approach. SCOPE incorporates the accuracy check and the consistency check to obtain the accurate and reliable performance of serverless functions. The evaluation shows that SCOPE provides testing results with 97.25% accuracy, 33.83 percentage points higher than the best currently available technique. Moreover, the superiority of SCOPE over the state-of-the-art holds on all functions that we study.

SCOPE: Performance Testing for Serverless Computing

TL;DR

97%) and reliability than state-of-the-art baselines (PT4Cloud, Metior, CONFIRM) across 65 functions from a public dataset. The results indicate SCOPE’s strong generalizability across cold/warm starts, diverse triggers, and bursty workloads, with practical implications for reducing testing overhead and enabling per-function tailoring of performance measurements. The work also provides a public repository for replication and outlines future directions to broaden function coverage and integrate SCOPE into developer tooling.

Abstract

Paper Structure (32 sections, 2 equations, 15 figures, 3 tables)

This paper contains 32 sections, 2 equations, 15 figures, 3 tables.

Introduction
Background
Serverless computing
Serverless function performance
Motivation
Our Performance Testing Approach: SCOPE
Key characteristics
Overview of SCOPE
Stopping criterion of SCOPE
Implementations of SCOPE
An illustrating example of applying SCOPE
Experimental evaluation
Research questions
Baselines
Dataset
...and 17 more sections

Figures (15)

Figure 1: The process of using serverless computing.
Figure 2: The workflow of SCOPE.
Figure 3: RQ2: Changes in metric values under different constraints $r\%$ for SCOPE 1 (mean results in cold and warm starts for tested functions).
Figure 4: RQ2: Changes in metric values under different constraints $r\%$ for SCOPE 2 (mean results in cold and warm starts for tested functions).
Figure 5: RQ2: Changes in metric values under different constraints $r\%$ for SCOPE 3 (mean results in cold and warm starts for tested functions).
...and 10 more figures

SCOPE: Performance Testing for Serverless Computing

TL;DR

Abstract

SCOPE: Performance Testing for Serverless Computing

Authors

TL;DR

Abstract

Table of Contents

Figures (15)