A Novel Technique for Query Plan Representation Based on Graph Neural Nets
Baoming Chang, Amin Kamali, Verena Kantere
TL;DR
This work investigates how different tree-structured representations of query execution plans impact cost estimation and plan selection in ML-based database optimizers. It compares state-of-the-art tree models and introduces BiGG, a novel bidirectional GNN framework with GRU-based aggregation for query plan trees. Experiments on TPC-DS workloads show BiGG improves cost-estimation accuracy and performs competitively for plan selection, though gains in cost estimation do not always translate to plan choice. The work demonstrates the potential of graph-based representations for query plan modeling and outlines directions for integrating BiGG into full optimizers.
Abstract
Learning representations for query plans play a pivotal role in machine learning-based query optimizers of database management systems. To this end, particular model architectures are proposed in the literature to transform the tree-structured query plans into representations with formats learnable by downstream machine learning models. However, existing research rarely compares and analyzes the query plan representation capabilities of these tree models and their direct impact on the performance of the overall optimizer. To address this problem, we perform a comparative study to explore the effect of using different state-of-the-art tree models on the optimizer's cost estimation and plan selection performance in relatively complex workloads. Additionally, we explore the possibility of using graph neural networks (GNNs) in the query plan representation task. We propose a novel tree model BiGG employing Bidirectional GNN aggregated by Gated recurrent units (GRUs) and demonstrate experimentally that BiGG provides significant improvements to cost estimation tasks and relatively excellent plan selection performance compared to the state-of-the-art tree models.
