A Query Optimization Method Utilizing Large Language Models
Zhiming Yao, Haoyang Li, Jing Zhang, Cuiping Li, Hong Chen
TL;DR
This work tackles the challenge of efficient query optimization in DBMS by replacing the traditional search and cost-model components with two fine-tuned LLMs. The LLMOpt framework introduces $LLM^{G}$ for generating plan candidates and $LLM^{S}$ for global, list-wise plan selection, trained offline on database statistics and execution results. Across JOB, JOB-EXT, and Stack benchmarks, both LLMOpt(G) and LLMOpt(S) outperform PostgreSQL, BAO, and HybridQO, with $LLM^S$ delivering the best practical end-to-end latency. The findings suggest that LLMs, when properly fine-tuned with domain data, can provide robust, scalable, and efficient query optimization, with potential for further acceleration and improved integration in real systems.
Abstract
Query optimization is a critical task in database systems, focused on determining the most efficient way to execute a query from an enormous set of possible strategies. Traditional approaches rely on heuristic search methods and cost predictions, but these often struggle with the complexity of the search space and inaccuracies in performance estimation, leading to suboptimal plan choices. This paper presents LLMOpt, a novel framework that leverages Large Language Models (LLMs) to address these challenges through two innovative components: (1) LLM for Plan Candidate Generation (LLMOpt(G)), which eliminates heuristic search by utilizing the reasoning abilities of LLMs to directly generate high-quality query plans, and (2) LLM for Plan Candidate Selection (LLMOpt(S)), a list-wise cost model that compares candidates globally to enhance selection accuracy. To adapt LLMs for query optimization, we propose fine-tuning pre-trained models using optimization data collected offline. Experimental results on the JOB, JOB-EXT, and Stack benchmarks show that LLMOpt(G) and LLMOpt(S) outperform state-of-the-art methods, including PostgreSQL, BAO, and HybridQO. Notably, LLMOpt(S) achieves the best practical performance, striking a balance between plan quality and inference efficiency.
