Table of Contents
Fetching ...

The Unreasonable Effectiveness of LLMs for Query Optimization

Peter Akioyamen, Zixuan Yi, Ryan Marcus

TL;DR

Surprisingly, it is shown that LLM embeddings of query text contain useful semantic information for query optimization, and a simple binary classifier deciding between alternative query plans, trained only on a small number of labeled embedded query vectors, can outperform existing heuristic systems.

Abstract

Recent work in database query optimization has used complex machine learning strategies, such as customized reinforcement learning schemes. Surprisingly, we show that LLM embeddings of query text contain useful semantic information for query optimization. Specifically, we show that a simple binary classifier deciding between alternative query plans, trained only on a small number of labeled embedded query vectors, can outperform existing heuristic systems. Although we only present some preliminary results, an LLM-powered query optimizer could provide significant benefits, both in terms of performance and simplicity.

The Unreasonable Effectiveness of LLMs for Query Optimization

TL;DR

Surprisingly, it is shown that LLM embeddings of query text contain useful semantic information for query optimization, and a simple binary classifier deciding between alternative query plans, trained only on a small number of labeled embedded query vectors, can outperform existing heuristic systems.

Abstract

Recent work in database query optimization has used complex machine learning strategies, such as customized reinforcement learning schemes. Surprisingly, we show that LLM embeddings of query text contain useful semantic information for query optimization. Specifically, we show that a simple binary classifier deciding between alternative query plans, trained only on a small number of labeled embedded query vectors, can outperform existing heuristic systems. Although we only present some preliminary results, an LLM-powered query optimizer could provide significant benefits, both in terms of performance and simplicity.

Paper Structure

This paper contains 19 sections, 5 figures.

Figures (5)

  • Figure 1: LLMSteer system model.
  • Figure 2: Empirical CDF of latency across cross-validation testing workloads by steering strategy. Note that the latency axis is set to $\log$ scale.
  • Figure 3: Mean LLMSteer performance on 10-fold cross-validation testing workloads.
  • Figure 4: Aggregate confusion matrix of LLMSteer across cross-validation testing workloads.
  • Figure 5: Mean total latency of LLMSteer trained on augmented syntaxes across 10-fold cross-validation testing workloads. Syntax A represents original queries, Syntax B represents formatted queries with spaced indention, Syntax C represents formatted queries with tabbed indentation.