Table of Contents
Fetching ...

Enhancing Computation Pushdown for Cloud OLAP Databases

Yifei Yang, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, Michael Stonebraker

TL;DR

This work tackles the network bottleneck in storage-disaggregation cloud OLAP systems by proposing Adaptive pushdown, a runtime mechanism that uses a pushback arbiter to decide whether a pushdown task should execute in storage or be processed at the computation layer. It provides a general principle for identifying pushdown-amenable operators and introduces two new pushdown operators—Selection Bitmap and Distributed Data Shuffle—that further enhance performance. The evaluation on TPC-H shows Adaptive pushdown achieving up to 1.9x speedup over baselines, while the new operators yield up to 3.0x improvements, demonstrating practical benefits for cloud OLAP workloads. The work offers actionable guidance for dynamic pushdown decisions and operator design in multi-tenant storage-disaggregated architectures, with FPDB as a concrete open-source platform.

Abstract

Network is a major bottleneck in modern cloud databases that adopt a storage-disaggregation architecture. Computation pushdown is a promising solution to tackle this issue, which offloads some computation tasks to the storage layer to reduce network traffic. Existing cloud OLAP systems statically decide whether to push down computation during the query optimization phase and do not consider the storage layer's computational capacity and load. Besides, there is a lack of a general principle that determines which operators are amenable for pushdown. Existing systems design and implement pushdown features empirically, which ends up picking a limited set of pushdown operators respectively. In this paper, we first design Adaptive pushdown as a new mechanism to avoid throttling the storage-layer computation during pushdown, which pushes the request back to the computation layer at runtime if the storage-layer computational resource is insufficient. Moreover, we derive a general principle to identify pushdown-amenable computational tasks, by summarizing common patterns of pushdown capabilities in existing systems. We propose two new pushdown operators, namely, selection bitmap and distributed data shuffle. Evaluation results on TPC-H show that Adaptive pushdown can achieve up to 1.9x speedup over both No pushdown and Eager pushdown baselines, and the new pushdown operators can further accelerate query execution by up to 3.0x.

Enhancing Computation Pushdown for Cloud OLAP Databases

TL;DR

This work tackles the network bottleneck in storage-disaggregation cloud OLAP systems by proposing Adaptive pushdown, a runtime mechanism that uses a pushback arbiter to decide whether a pushdown task should execute in storage or be processed at the computation layer. It provides a general principle for identifying pushdown-amenable operators and introduces two new pushdown operators—Selection Bitmap and Distributed Data Shuffle—that further enhance performance. The evaluation on TPC-H shows Adaptive pushdown achieving up to 1.9x speedup over baselines, while the new operators yield up to 3.0x improvements, demonstrating practical benefits for cloud OLAP workloads. The work offers actionable guidance for dynamic pushdown decisions and operator design in multi-tenant storage-disaggregated architectures, with FPDB as a concrete open-source platform.

Abstract

Network is a major bottleneck in modern cloud databases that adopt a storage-disaggregation architecture. Computation pushdown is a promising solution to tackle this issue, which offloads some computation tasks to the storage layer to reduce network traffic. Existing cloud OLAP systems statically decide whether to push down computation during the query optimization phase and do not consider the storage layer's computational capacity and load. Besides, there is a lack of a general principle that determines which operators are amenable for pushdown. Existing systems design and implement pushdown features empirically, which ends up picking a limited set of pushdown operators respectively. In this paper, we first design Adaptive pushdown as a new mechanism to avoid throttling the storage-layer computation during pushdown, which pushes the request back to the computation layer at runtime if the storage-layer computational resource is insufficient. Moreover, we derive a general principle to identify pushdown-amenable computational tasks, by summarizing common patterns of pushdown capabilities in existing systems. We propose two new pushdown operators, namely, selection bitmap and distributed data shuffle. Evaluation results on TPC-H show that Adaptive pushdown can achieve up to 1.9x speedup over both No pushdown and Eager pushdown baselines, and the new pushdown operators can further accelerate query execution by up to 3.0x.
Paper Structure (23 sections, 12 equations, 15 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 12 equations, 15 figures, 1 table, 1 algorithm.

Figures (15)

  • Figure 1: Performance of No pushdown and Eager pushdown on Sample Queries --- Q1 and Q19 in TPC-H.
  • Figure 2: Architecture of Adaptive pushdown --- The Adaptive Pushdown Arbitrator determines whether to accept a pushdown request for execution or push it back.
  • Figure 3: Selection Bitmap Pushdown (from the Storage Layer) --- The selection bitmap constructed at storage can be used to filter cached data at the computation layer.
  • Figure 4: Selection Bitmap Pushdown (from the Computation Layer) --- Storage can use the compute-layer selection bitmap to perform filtering without touching the predicate column.
  • Figure 5: Distributed Data Shuffle Pushdown --- Data is directly redistributed to the target compute node from the storage layer.
  • ...and 10 more figures