Increasing Efficiency and Result Reliability of Continuous Benchmarking for FaaS Applications
Tim C. Rese, Nils Japke, Sebastian Koch, Tobias Pfandzelter, David Bermbach
TL;DR
The paper tackles the challenge of detecting performance regressions in continuously deployed FaaS applications amid high platform variability. It adapts the duet benchmarking concept to FaaS (DuetFaaS) by running two function versions in parallel on a single cloud function instance, thereby reducing temporal and hardware variance. The proof-of-concept on AWS Lambda shows that DuetFaaS achieves equal or smaller confidence intervals in 98.41% of cases and can reach reliable results with as few as 100 invocations, markedly reducing time and cost compared to traditional and randomized sequential approaches. These findings support integrating DuetFaaS into CI/CD pipelines to enable faster, more reliable evaluation of releases in production-like FaaS environments, with planned extensions to other providers and cost analysis.
Abstract
In a continuous deployment setting, Function-as-a-Service (FaaS) applications frequently receive updated releases, each of which can cause a performance regression. While continuous benchmarking, i.e., comparing benchmark results of the updated and the previous version, can detect such regressions, performance variability of FaaS platforms necessitates thousands of function calls, thus, making continuous benchmarking time-intensive and expensive. In this paper, we propose DuetFaaS, an approach which adapts duet benchmarking to FaaS applications. With DuetFaaS, we deploy two versions of FaaS function in a single cloud function instance and execute them in parallel to reduce the impact of platform variability. We evaluate our approach against state-of-the-art approaches, running on AWS Lambda. Overall, DuetFaaS requires fewer invocations to accurately detect performance regressions than other state-of-the-art approaches. In 98.41% of evaluated cases, our approach provides equal or smaller confidence interval size. DuetFaaS achieves an interval size reduction in 59.06% of all evaluated sample sizes when compared to the competitive approaches.
