Learned Offline Query Planning via Bayesian Optimization
Jeffrey Tao, Natalie Maus, Haydn Jones, Yimeng Zeng, Jacob R. Gardner, Ryan Marcus
TL;DR
This paper tackles offline query optimization for repetitive analytic workloads by proposing BayesQO, which uses a variational autoencoder to embed query plans into a latent space and applies Bayesian optimization to discover fast plans. The method handles timeouts as censored observations and leverages initialization strategies from traditional optimizers and cross-query fine-tuned language models to improve search efficiency. Empirically, BayesQO outperforms online learned optimizers and hint-based baselines across multiple benchmarks, sometimes yielding substantial plan latency reductions, and demonstrates robustness to data drift with strategies for retraining and reoptimization. The work highlights practical implications for deploying offline optimization in real DBMS deployments and points to future directions in integrating with broader system design choices and leveraging learned initializations for rapid adaptation.
Abstract
Analytics database workloads often contain queries that are executed repeatedly. Existing optimization techniques generally prioritize keeping optimization cost low, normally well below the time it takes to execute a single instance of a query. If a given query is going to be executed thousands of times, could it be worth investing significantly more optimization time? In contrast to traditional online query optimizers, we propose an offline query optimizer that searches a wide variety of plans and incorporates query execution as a primitive. Our offline query optimizer combines variational auto-encoders with Bayesian optimization to find optimized plans for a given query. We compare our technique to the optimal plans possible with PostgreSQL and recent RL-based systems over several datasets, and show that our technique finds faster query plans.
