Intra-Query Runtime Elasticity for Cloud-Native Data Analysis
Xukang Zhang, Huanchen Zhang, Xiaofeng Meng
TL;DR
This work introduces Intra-Query Runtime Elasticity (IQRE), a capability that enables cloud-native OLAP engines to adjust a query's degree of parallelism during execution without pausing data processing. It presents Accordion, the first IQRE engine, featuring a what-if backed auto-tuner, a dynamic scheduler, and a vectorized push-based core inspired by Presto to minimize compute while meeting latency constraints. The paper details architectural changes, buffer redesigns, intra-task and intra-stage DOP tuning, DOP switching for partitioned hash joins, and an elastic shuffle stage, all validated on a 21-node AWS cluster with substantial latency reductions (up to ~73.7%). These results demonstrate that IQRE can significantly reduce compute costs while guaranteeing latency targets, with future work targeting heterogeneous hardware, dynamic execution plans, and AI-driven DOP decisions.
Abstract
We propose the concept of Intra-Query Runtime Elasticity (IQRE) for cloud-native data analysis. IQRE enables a cloud-native OLAP engine to dynamically adjust a query's Degree of Parallelism (DOP) during execution. This capability allows users to utilize cloud computing resources more cost-effectively. We present Accordion, the first IQRE query engine. Accordion can adjust the parallelism of a query at any point during query execution without pausing data processing. It features a user-friendly interface and an auto-tuner backed by a "what-if" service to allow users to adjust the DOP according to their query latency constraints. The design of Accordion follows the execution model in Presto, an open-source distributed SQL query engine developed at Meta. We present the implementation of Accordion and demonstrate its ease of use, showcasing how it enables users to minimize compute resource consumption while meeting their query time constraints.
