Table of Contents
Fetching ...

A Query Optimization Method Utilizing Large Language Models

Zhiming Yao, Haoyang Li, Jing Zhang, Cuiping Li, Hong Chen

TL;DR

This work tackles the challenge of efficient query optimization in DBMS by replacing the traditional search and cost-model components with two fine-tuned LLMs. The LLMOpt framework introduces $LLM^{G}$ for generating plan candidates and $LLM^{S}$ for global, list-wise plan selection, trained offline on database statistics and execution results. Across JOB, JOB-EXT, and Stack benchmarks, both LLMOpt(G) and LLMOpt(S) outperform PostgreSQL, BAO, and HybridQO, with $LLM^S$ delivering the best practical end-to-end latency. The findings suggest that LLMs, when properly fine-tuned with domain data, can provide robust, scalable, and efficient query optimization, with potential for further acceleration and improved integration in real systems.

Abstract

Query optimization is a critical task in database systems, focused on determining the most efficient way to execute a query from an enormous set of possible strategies. Traditional approaches rely on heuristic search methods and cost predictions, but these often struggle with the complexity of the search space and inaccuracies in performance estimation, leading to suboptimal plan choices. This paper presents LLMOpt, a novel framework that leverages Large Language Models (LLMs) to address these challenges through two innovative components: (1) LLM for Plan Candidate Generation (LLMOpt(G)), which eliminates heuristic search by utilizing the reasoning abilities of LLMs to directly generate high-quality query plans, and (2) LLM for Plan Candidate Selection (LLMOpt(S)), a list-wise cost model that compares candidates globally to enhance selection accuracy. To adapt LLMs for query optimization, we propose fine-tuning pre-trained models using optimization data collected offline. Experimental results on the JOB, JOB-EXT, and Stack benchmarks show that LLMOpt(G) and LLMOpt(S) outperform state-of-the-art methods, including PostgreSQL, BAO, and HybridQO. Notably, LLMOpt(S) achieves the best practical performance, striking a balance between plan quality and inference efficiency.

A Query Optimization Method Utilizing Large Language Models

TL;DR

This work tackles the challenge of efficient query optimization in DBMS by replacing the traditional search and cost-model components with two fine-tuned LLMs. The LLMOpt framework introduces for generating plan candidates and for global, list-wise plan selection, trained offline on database statistics and execution results. Across JOB, JOB-EXT, and Stack benchmarks, both LLMOpt(G) and LLMOpt(S) outperform PostgreSQL, BAO, and HybridQO, with delivering the best practical end-to-end latency. The findings suggest that LLMs, when properly fine-tuned with domain data, can provide robust, scalable, and efficient query optimization, with potential for further acceleration and improved integration in real systems.

Abstract

Query optimization is a critical task in database systems, focused on determining the most efficient way to execute a query from an enormous set of possible strategies. Traditional approaches rely on heuristic search methods and cost predictions, but these often struggle with the complexity of the search space and inaccuracies in performance estimation, leading to suboptimal plan choices. This paper presents LLMOpt, a novel framework that leverages Large Language Models (LLMs) to address these challenges through two innovative components: (1) LLM for Plan Candidate Generation (LLMOpt(G)), which eliminates heuristic search by utilizing the reasoning abilities of LLMs to directly generate high-quality query plans, and (2) LLM for Plan Candidate Selection (LLMOpt(S)), a list-wise cost model that compares candidates globally to enhance selection accuracy. To adapt LLMs for query optimization, we propose fine-tuning pre-trained models using optimization data collected offline. Experimental results on the JOB, JOB-EXT, and Stack benchmarks show that LLMOpt(G) and LLMOpt(S) outperform state-of-the-art methods, including PostgreSQL, BAO, and HybridQO. Notably, LLMOpt(S) achieves the best practical performance, striking a balance between plan quality and inference efficiency.

Paper Structure

This paper contains 34 sections, 3 equations, 6 figures, 5 tables, 4 algorithms.

Figures (6)

  • Figure 1: The Search-and-Select Pipeline Query Optimization
  • Figure 2: Limitations of Existing Methods — (a) Heuristic search strategies inefficiently navigate large query plan spaces, compounded by (b) suboptimal plan selection due to inaccurate cost models.
  • Figure 3: Framework of LLMOpt, utilizing LLMs for candidate generation and candidate selection.
  • Figure 4: An example of collecting labels for LLMOpt by searching for candidates, simplify these plans and serializing plans to sequential hints.
  • Figure 5: An Example of LLM Input and Output.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 4.1
  • Definition 4.2