Table of Contents
Fetching ...

Hydro: Adaptive Query Processing of ML Queries

Gaurav Tarlok Kakkar, Jiashen Cao, Aubhro Sengupta, Joy Arulraj, Hyesoon Kim

TL;DR

Hydro tackles the challenge of optimizing ML-centric queries by eliminating reliance on static UDF statistics and applying adaptive query processing (AQP) during execution. It combines the Eddy operator for dynamic predicate routing with Laminar for hardware-aware parallelism, enabling data-aware load balancing and reuse-aware routing. The approach includes a warmup phase, per-batch routing metadata, and batch-based execution to manage expensive UDFs without compromising accuracy. Across four diverse use cases, Hydro achieves up to $11.52\times$ speedup over a static baseline, demonstrating practical impact for ML workloads in a DBMS and highlighting its potential for scalable ML analytics within databases.

Abstract

Query optimization in relational database management systems (DBMSs) is critical for fast query processing. The query optimizer relies on precise selectivity and cost estimates to effectively optimize queries prior to execution. While this strategy is effective for relational DBMSs, it is not sufficient for DBMSs tailored for processing machine learning (ML) queries. In ML-centric DBMSs, query optimization is challenging for two reasons. First, the performance bottleneck of the queries shifts to user-defined functions (UDFs) that often wrap around deep learning models, making it difficult to accurately estimate UDF statistics without profiling the query. This leads to inaccurate statistics and sub-optimal query plans. Second, the optimal query plan for ML queries is data-dependent, necessitating DBMSs to adapt the query plan on the fly during execution. So, a static query plan is not sufficient for such queries. In this paper, we present Hydro, an ML-centric DBMS that utilizes adaptive query processing (AQP) for efficiently processing ML queries. Hydro is designed to quickly evaluate UDF-based query predicates by ensuring optimal predicate evaluation order and improving the scalability of UDF execution. By integrating AQP, Hydro continuously monitors UDF statistics, routes data to predicates in an optimal order, and dynamically allocates resources for evaluating predicates. We demonstrate Hydro's efficacy through four illustrative use cases, delivering up to 11.52x speedup over a baseline system.

Hydro: Adaptive Query Processing of ML Queries

TL;DR

Hydro tackles the challenge of optimizing ML-centric queries by eliminating reliance on static UDF statistics and applying adaptive query processing (AQP) during execution. It combines the Eddy operator for dynamic predicate routing with Laminar for hardware-aware parallelism, enabling data-aware load balancing and reuse-aware routing. The approach includes a warmup phase, per-batch routing metadata, and batch-based execution to manage expensive UDFs without compromising accuracy. Across four diverse use cases, Hydro achieves up to speedup over a static baseline, demonstrating practical impact for ML workloads in a DBMS and highlighting its potential for scalable ML analytics within databases.

Abstract

Query optimization in relational database management systems (DBMSs) is critical for fast query processing. The query optimizer relies on precise selectivity and cost estimates to effectively optimize queries prior to execution. While this strategy is effective for relational DBMSs, it is not sufficient for DBMSs tailored for processing machine learning (ML) queries. In ML-centric DBMSs, query optimization is challenging for two reasons. First, the performance bottleneck of the queries shifts to user-defined functions (UDFs) that often wrap around deep learning models, making it difficult to accurately estimate UDF statistics without profiling the query. This leads to inaccurate statistics and sub-optimal query plans. Second, the optimal query plan for ML queries is data-dependent, necessitating DBMSs to adapt the query plan on the fly during execution. So, a static query plan is not sufficient for such queries. In this paper, we present Hydro, an ML-centric DBMS that utilizes adaptive query processing (AQP) for efficiently processing ML queries. Hydro is designed to quickly evaluate UDF-based query predicates by ensuring optimal predicate evaluation order and improving the scalability of UDF execution. By integrating AQP, Hydro continuously monitors UDF statistics, routes data to predicates in an optimal order, and dynamically allocates resources for evaluating predicates. We demonstrate Hydro's efficacy through four illustrative use cases, delivering up to 11.52x speedup over a baseline system.
Paper Structure (20 sections, 1 equation, 14 figures, 1 table)

This paper contains 20 sections, 1 equation, 14 figures, 1 table.

Figures (14)

  • Figure 1: Query Execution Pipelines -- In static query processing, the predicate ordering is determined based on statistics estimated during query optimization. In contrast, adaptive query processing dynamically governs the predicate ordering during query execution.
  • Figure 2: Detailed AQP execution plan and its internal -- Left shows the execution tree with AQP executor. Diamonds represent physical processes apart from the main process. All physical queues serve as a medium for communicating between data producers and consumers. Rectangles represent routing policies attached to according to processes.
  • Figure 3: Query plan for UC1 -- query plan with and w/o predicate reordering for UC1.
  • Figure 4: Routing policy comparison -- execution timeline of selectivity-driven, score-driven, and cost-driven routing policy. One box represents a time unit.
  • Figure 5: Query processing time for UC1 -- comparison among five system options: no reordering, best reordering, Eddy cost-driven routing, Eddy score-driven routing, and Eddy selectivity-driven routing.
  • ...and 9 more figures