Increasing Efficiency and Result Reliability of Continuous Benchmarking for FaaS Applications

Tim C. Rese; Nils Japke; Sebastian Koch; Tobias Pfandzelter; David Bermbach

Increasing Efficiency and Result Reliability of Continuous Benchmarking for FaaS Applications

Tim C. Rese, Nils Japke, Sebastian Koch, Tobias Pfandzelter, David Bermbach

TL;DR

The paper tackles the challenge of detecting performance regressions in continuously deployed FaaS applications amid high platform variability. It adapts the duet benchmarking concept to FaaS (DuetFaaS) by running two function versions in parallel on a single cloud function instance, thereby reducing temporal and hardware variance. The proof-of-concept on AWS Lambda shows that DuetFaaS achieves equal or smaller confidence intervals in 98.41% of cases and can reach reliable results with as few as 100 invocations, markedly reducing time and cost compared to traditional and randomized sequential approaches. These findings support integrating DuetFaaS into CI/CD pipelines to enable faster, more reliable evaluation of releases in production-like FaaS environments, with planned extensions to other providers and cost analysis.

Abstract

In a continuous deployment setting, Function-as-a-Service (FaaS) applications frequently receive updated releases, each of which can cause a performance regression. While continuous benchmarking, i.e., comparing benchmark results of the updated and the previous version, can detect such regressions, performance variability of FaaS platforms necessitates thousands of function calls, thus, making continuous benchmarking time-intensive and expensive. In this paper, we propose DuetFaaS, an approach which adapts duet benchmarking to FaaS applications. With DuetFaaS, we deploy two versions of FaaS function in a single cloud function instance and execute them in parallel to reduce the impact of platform variability. We evaluate our approach against state-of-the-art approaches, running on AWS Lambda. Overall, DuetFaaS requires fewer invocations to accurately detect performance regressions than other state-of-the-art approaches. In 98.41% of evaluated cases, our approach provides equal or smaller confidence interval size. DuetFaaS achieves an interval size reduction in 59.06% of all evaluated sample sizes when compared to the competitive approaches.

Increasing Efficiency and Result Reliability of Continuous Benchmarking for FaaS Applications

TL;DR

Abstract

Paper Structure (18 sections, 5 figures)

This paper contains 18 sections, 5 figures.

Introduction
Current State of Function-as-a-Service (FaaS) Release Benchmarking
Function-as-a-Service (FaaS)
Continuous Benchmarking
Continuous FaaS Application Benchmarking
Result Analysis
Duet Benchmarking FaaS Application Releases
Parallel Isolated Execution
Realistic Environment
Pipeline Integration
Evaluation
Proof-of-Concept Prototype
Experiment Design
Performance Regression Detection
Impact of Sample Size
...and 3 more sections

Figures (5)

Figure 1: Traditional FaaS Benchmarking, where function versions are deployed independently of one another. Load generation calls are then made separately to both.
Figure 2: RMIT Functionality within faasterBench, where a wrapper artifact containing both versions determines a random factor. This random factor decides in which order the functions are run, and the experiment is repeated for several trials.
Figure 3: Whenever a change in the repository is made, a deployment pipeline uploads the updated and previous version to multiple FaaS instances. Here, both versions are run on separate cores and in parallel to enable duet benchmarking.
Figure 4: Performance regression detected by each approach in 1,500 benchmark repetitions, along with the $99\%$ confidence interval. Red lines mark the 'true' performance regression of the function. Duet benchmarking provides accurate intervals compared to the other approaches. It however slightly misses the true regression in one of our configurations.
Figure 5: Duet Benchmarking consistenly produces extremely accurate intervals when only including 100 results, which the other approaches do not achieve even with 1,500 repetitions in some configurations.

Increasing Efficiency and Result Reliability of Continuous Benchmarking for FaaS Applications

TL;DR

Abstract

Increasing Efficiency and Result Reliability of Continuous Benchmarking for FaaS Applications

Authors

TL;DR

Abstract

Table of Contents

Figures (5)