Table of Contents
Fetching ...

Intra-Query Runtime Elasticity for Cloud-Native Data Analysis

Xukang Zhang, Huanchen Zhang, Xiaofeng Meng

TL;DR

This work introduces Intra-Query Runtime Elasticity (IQRE), a capability that enables cloud-native OLAP engines to adjust a query's degree of parallelism during execution without pausing data processing. It presents Accordion, the first IQRE engine, featuring a what-if backed auto-tuner, a dynamic scheduler, and a vectorized push-based core inspired by Presto to minimize compute while meeting latency constraints. The paper details architectural changes, buffer redesigns, intra-task and intra-stage DOP tuning, DOP switching for partitioned hash joins, and an elastic shuffle stage, all validated on a 21-node AWS cluster with substantial latency reductions (up to ~73.7%). These results demonstrate that IQRE can significantly reduce compute costs while guaranteeing latency targets, with future work targeting heterogeneous hardware, dynamic execution plans, and AI-driven DOP decisions.

Abstract

We propose the concept of Intra-Query Runtime Elasticity (IQRE) for cloud-native data analysis. IQRE enables a cloud-native OLAP engine to dynamically adjust a query's Degree of Parallelism (DOP) during execution. This capability allows users to utilize cloud computing resources more cost-effectively. We present Accordion, the first IQRE query engine. Accordion can adjust the parallelism of a query at any point during query execution without pausing data processing. It features a user-friendly interface and an auto-tuner backed by a "what-if" service to allow users to adjust the DOP according to their query latency constraints. The design of Accordion follows the execution model in Presto, an open-source distributed SQL query engine developed at Meta. We present the implementation of Accordion and demonstrate its ease of use, showcasing how it enables users to minimize compute resource consumption while meeting their query time constraints.

Intra-Query Runtime Elasticity for Cloud-Native Data Analysis

TL;DR

This work introduces Intra-Query Runtime Elasticity (IQRE), a capability that enables cloud-native OLAP engines to adjust a query's degree of parallelism during execution without pausing data processing. It presents Accordion, the first IQRE engine, featuring a what-if backed auto-tuner, a dynamic scheduler, and a vectorized push-based core inspired by Presto to minimize compute while meeting latency constraints. The paper details architectural changes, buffer redesigns, intra-task and intra-stage DOP tuning, DOP switching for partitioned hash joins, and an elastic shuffle stage, all validated on a 21-node AWS cluster with substantial latency reductions (up to ~73.7%). These results demonstrate that IQRE can significantly reduce compute costs while guaranteeing latency targets, with future work targeting heterogeneous hardware, dynamic execution plans, and AI-driven DOP decisions.

Abstract

We propose the concept of Intra-Query Runtime Elasticity (IQRE) for cloud-native data analysis. IQRE enables a cloud-native OLAP engine to dynamically adjust a query's Degree of Parallelism (DOP) during execution. This capability allows users to utilize cloud computing resources more cost-effectively. We present Accordion, the first IQRE query engine. Accordion can adjust the parallelism of a query at any point during query execution without pausing data processing. It features a user-friendly interface and an auto-tuner backed by a "what-if" service to allow users to adjust the DOP according to their query latency constraints. The design of Accordion follows the execution model in Presto, an open-source distributed SQL query engine developed at Meta. We present the implementation of Accordion and demonstrate its ease of use, showcasing how it enables users to minimize compute resource consumption while meeting their query time constraints.

Paper Structure

This paper contains 29 sections, 30 figures, 2 tables.

Figures (30)

  • Figure 1: Accordion's Main Interface -- it includes a SQL input box on the left and the query execution progress tracking box on the right.
  • Figure 2: Accordion's Controller Interface -- it is composed of three sections: the query plan display box, the auto-tuner box, and the stage information box.
  • Figure 3: Architecture of Presto.
  • Figure 4: Distributed physical plan of example query.
  • Figure 5: Partial distributed execution plan of the distributed physical plan.
  • ...and 25 more figures