Sampling in Cloud Benchmarking: A Critical Review and Methodological Guidelines
Saman Akbari, Manfred Hauswirth
TL;DR
This paper critically reviews sampling practices in cloud benchmarking by analyzing 115 recent studies across four dimensions: sampling method, sample origin, sample size, and sample availability. It finds pervasive non-probability sampling, heavy reliance on a single benchmark, and limited access to benchmarks and data, raising concerns about generalizability and reproducibility. To address this, the authors propose four methodological guidelines to improve transparency: clear sampling rationale, transparent sampling description, acknowledgment of potential sampling bias, and open access to artifacts, with examples of adherence. The study also provides an open replication package to enable replication and further meta-analysis, and it calls for community-wide development of standardized benchmarks to improve comparability.
Abstract
Cloud benchmarks suffer from performance fluctuations caused by resource contention, network latency, hardware heterogeneity, and other factors along with decisions taken in the benchmark design. In particular, the sampling strategy of benchmark designers can significantly influence benchmark results. Despite this well-known fact, no systematic approach has been devised so far to make sampling results comparable and guide benchmark designers in choosing their sampling strategy for use within benchmarks. To identify systematic problems, we critically review sampling in recent cloud computing research. Our analysis identifies concerning trends: (i) a high prevalence of non-probability sampling, (ii) over-reliance on a single benchmark, and (iii) restricted access to samples. To address these issues and increase transparency in sampling, we propose methodological guidelines for researchers and reviewers. We hope that our work contributes to improving the generalizability, reproducibility, and reliability of research results.
