Skyrise: Exploiting Serverless Cloud Infrastructure for Elastic Data Processing
Thomas Bodner, Daniel Ritter, Martin Boissier, Tilmann Rabl
TL;DR
Skyrise tackles the lack of end-to-end serverless SQL processing by building the first fully serverless SQL engine on top of FaaS compute and serverless storage. It employs adaptive and cost-aware techniques, including retriggering of stragglers, a cache of intermediate results in serverless storage, and tiered shuffles to cope with performance variability and storage bottlenecks. Evaluations on the TPC-H benchmark show Skyrise achieving competitive latency and cost relative to both academic prototypes and commercial cloud systems for terabyte-scale queries, while maintaining elasticity from zero to large workloads. This work demonstrates the practicality of serverless SQL for analytics and provides an open-source platform to drive further research and adoption.
Abstract
Serverless computing offers elasticity unmatched by conventional server-based cloud infrastructure. Although modern data processing systems embrace serverless storage, such as Amazon S3, they continue to manage their compute resources as servers. This is challenging for unpredictable workloads, leaving clusters often underutilized. Recent research shows the potential of serverless compute resources, such as cloud functions, for elastic data processing, but also sees limitations in performance robustness and cost efficiency for long running workloads. These challenges require holistic approaches across the system stack. However, to the best of our knowledge, there is no end-to-end data processing system built entirely on serverless infrastructure. In this paper, we present Skyrise, our effort towards building the first fully serverless SQL query processor. Skyrise exploits the elasticity of its underlying infrastructure, while alleviating the inherent limitations with a number of adaptive and cost-aware techniques. We show that both Skyrise's performance and cost are competitive to other cloud data systems for terabyte-scale queries of the analytical TPC-H benchmark.
