Table of Contents
Fetching ...

GenRewrite: Query Rewriting via Large Language Models

Jie Liu, Barzan Mozafari

TL;DR

GenRewrite presents a holistic system that uses Large Language Models to rewrite SQL queries beyond traditional rules. It introduces Natural Language Rewrite Rules (NLR2s) to transfer knowledge between queries and employs a counterexample-guided correction loop to ensure semantic and syntactic correctness while reducing LLM costs. Through bottleneck-aware prompting, plan-based NLR2 dominance, and NLR2 grouping, GenRewrite achieves significantly higher speedups and equivalence rates than both baseline LLM approaches and rule-based methods across TPC-DS, JOB, and SQLStorm variants. The approach demonstrates practical viability for recurrent, high-cost workloads, with a thorough evaluation of performance, cost, and robustness, underscoring its potential as a scalable augmentation to existing database optimizers.

Abstract

Query rewriting is an effective technique for refining poorly written queries before they reach the query optimizer. However, manual rewriting is not scalable, as it is prone to errors and requires deep expertise. Traditional query rewriting algorithms fall short too: rule-based approaches fail to generalize to new query patterns, while synthesis-based methods struggle with complex queries. Fortunately, Large Language Models (LLMs) already possess broad knowledge and advanced reasoning capabilities, making them a promising solution for tackling these longstanding challenges. In this paper, we present GenRewrite, the first holistic system that leverages LLMs for query rewriting beyond traditional rules. We introduce the notion of Natural Language Rewrite Rules (NLR2s), which serve as hints for the LLM while also a means of knowledge transfer from rewriting one query to another, allowing GenRewrite to become smarter and more effective over time. We present a novel counterexample-guided technique that iteratively corrects the syntactic and semantic errors in the rewritten query, significantly reducing the LLM costs and the manual effort required for verification. Across the standard TPC-DS and JOB benchmarks and their SQLStorm-generated variants, GenRewrite consistently optimizes more queries at every speedup threshold than all baselines. At the >=2x threshold on TPC-DS, GenRewrite improves 25 queries-1.35x more than LLM-driven baselines and 2.6x more than LLM-enhanced rule-based baselines-and the gap widens further on TPC-DS (SQLStorm); on JOB and its SQLStorm variant, where queries are simpler, absolute gains are smaller but GenRewrite still leads by a notable margin.

GenRewrite: Query Rewriting via Large Language Models

TL;DR

GenRewrite presents a holistic system that uses Large Language Models to rewrite SQL queries beyond traditional rules. It introduces Natural Language Rewrite Rules (NLR2s) to transfer knowledge between queries and employs a counterexample-guided correction loop to ensure semantic and syntactic correctness while reducing LLM costs. Through bottleneck-aware prompting, plan-based NLR2 dominance, and NLR2 grouping, GenRewrite achieves significantly higher speedups and equivalence rates than both baseline LLM approaches and rule-based methods across TPC-DS, JOB, and SQLStorm variants. The approach demonstrates practical viability for recurrent, high-cost workloads, with a thorough evaluation of performance, cost, and robustness, underscoring its potential as a scalable augmentation to existing database optimizers.

Abstract

Query rewriting is an effective technique for refining poorly written queries before they reach the query optimizer. However, manual rewriting is not scalable, as it is prone to errors and requires deep expertise. Traditional query rewriting algorithms fall short too: rule-based approaches fail to generalize to new query patterns, while synthesis-based methods struggle with complex queries. Fortunately, Large Language Models (LLMs) already possess broad knowledge and advanced reasoning capabilities, making them a promising solution for tackling these longstanding challenges. In this paper, we present GenRewrite, the first holistic system that leverages LLMs for query rewriting beyond traditional rules. We introduce the notion of Natural Language Rewrite Rules (NLR2s), which serve as hints for the LLM while also a means of knowledge transfer from rewriting one query to another, allowing GenRewrite to become smarter and more effective over time. We present a novel counterexample-guided technique that iteratively corrects the syntactic and semantic errors in the rewritten query, significantly reducing the LLM costs and the manual effort required for verification. Across the standard TPC-DS and JOB benchmarks and their SQLStorm-generated variants, GenRewrite consistently optimizes more queries at every speedup threshold than all baselines. At the >=2x threshold on TPC-DS, GenRewrite improves 25 queries-1.35x more than LLM-driven baselines and 2.6x more than LLM-enhanced rule-based baselines-and the gap widens further on TPC-DS (SQLStorm); on JOB and its SQLStorm variant, where queries are simpler, absolute gains are smaller but GenRewrite still leads by a notable margin.
Paper Structure (26 sections, 2 equations, 8 figures, 25 tables, 1 algorithm)

This paper contains 26 sections, 2 equations, 8 figures, 25 tables, 1 algorithm.

Figures (8)

  • Figure 1: High-level Workflow of GenRewrite
  • Figure 3: The baseline approach for LLM-based query rewriting (Baseline LLM)
  • Figure 4: A prompt incorporating a selected NLR2.
  • Figure 5: The prompt to predict the group to which the incoming NLR2 belongs
  • Figure 6: The prompt for semantic correction
  • ...and 3 more figures