Table of Contents
Fetching ...

TailorSQL: An NL2SQL System Tailored to Your Query Workload

Kapil Vaidya, Jialin Ding, Sebastian Kosak, David Kernert, Chuan Lei, Xiao Qin, Abhinav Tripathy, Ramesh Balan, Balakrishnan Narayanaswamy, Tim Kraska

TL;DR

TailorSQL introduces a workload-aware NL2SQL system that harnesses historical query logs to generate schema and hint documents, and to tailor document embeddings and retrieval for a given database. By offline extracting hints from past queries and employing a context-aware, class-specific retrieval strategy paired with a Bayesian-optimized context allocator, TailorSQL achieves higher execution accuracy and lower latency compared to non-workload-aware baselines. A bandit-based abstention policy guards against performance degradation when workloads drift, enabling graceful fallback to generic NL2SQL. Across three benchmarks, TailorSQL demonstrates consistent accuracy gains (up to 22.7%) and significant prompt-size reductions (2–15x) while remaining compatible with existing NL2SQL methods.

Abstract

NL2SQL (natural language to SQL) translates natural language questions into SQL queries, thereby making structured data accessible to non-technical users, serving as the foundation for intelligent data applications. State-of-the-art NL2SQL techniques typically perform translation by retrieving database-specific information, such as the database schema, and invoking a pre-trained large language model (LLM) using the question and retrieved information to generate the SQL query. However, existing NL2SQL techniques miss a key opportunity which is present in real-world settings: NL2SQL is typically applied on existing databases which have already served many SQL queries in the past. The past query workload implicitly contains information which is helpful for accurate NL2SQL translation and is not apparent from the database schema alone, such as common join paths and the semantics of obscurely-named tables and columns. We introduce TailorSQL, a NL2SQL system that takes advantage of information in the past query workload to improve both the accuracy and latency of translating natural language questions into SQL. By specializing to a given workload, TailorSQL achieves up to 2$\times$ improvement in execution accuracy on standardized benchmarks.

TailorSQL: An NL2SQL System Tailored to Your Query Workload

TL;DR

TailorSQL introduces a workload-aware NL2SQL system that harnesses historical query logs to generate schema and hint documents, and to tailor document embeddings and retrieval for a given database. By offline extracting hints from past queries and employing a context-aware, class-specific retrieval strategy paired with a Bayesian-optimized context allocator, TailorSQL achieves higher execution accuracy and lower latency compared to non-workload-aware baselines. A bandit-based abstention policy guards against performance degradation when workloads drift, enabling graceful fallback to generic NL2SQL. Across three benchmarks, TailorSQL demonstrates consistent accuracy gains (up to 22.7%) and significant prompt-size reductions (2–15x) while remaining compatible with existing NL2SQL methods.

Abstract

NL2SQL (natural language to SQL) translates natural language questions into SQL queries, thereby making structured data accessible to non-technical users, serving as the foundation for intelligent data applications. State-of-the-art NL2SQL techniques typically perform translation by retrieving database-specific information, such as the database schema, and invoking a pre-trained large language model (LLM) using the question and retrieved information to generate the SQL query. However, existing NL2SQL techniques miss a key opportunity which is present in real-world settings: NL2SQL is typically applied on existing databases which have already served many SQL queries in the past. The past query workload implicitly contains information which is helpful for accurate NL2SQL translation and is not apparent from the database schema alone, such as common join paths and the semantics of obscurely-named tables and columns. We introduce TailorSQL, a NL2SQL system that takes advantage of information in the past query workload to improve both the accuracy and latency of translating natural language questions into SQL. By specializing to a given workload, TailorSQL achieves up to 2 improvement in execution accuracy on standardized benchmarks.

Paper Structure

This paper contains 26 sections, 7 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: (A) The input to NL2SQL is a user question over a database which has a certain schema and past query workload, and the desired output is a SQL query. (A) The typical NL2SQL workflow will invoke the LLM with a prompt that includes the user question and the database's schema, which may not include enough information to produce an accurate SQL query. (C) Hints extracted from past queries help the LLM produce more accurate queries.
  • Figure 2: In an offline process, TailorSQL (A) generates schema and query hint documents based on information in the database, then (B) generates an embedding for each document using a combination of the document itself and several pieces of auxiliary information related to the document. (C) At runtime, we compute similarity between the user question's embedding and document embeddings in order to retrieve the most relevant documents, up to a per-document-class token limit. (D) The prompt, which includes the question and information in retrieved documents, is used to invoke an LLM to generate the SQL query.
  • Figure 3: Past filter predicates help interpret column values.
  • Figure 4: Past group-by clauses help indicate an implicit primary key. Note that the SQL query on the right only executes for SQL dialects that allow bare columns in aggregation queries, like SQLite.
  • Figure 5: TailorSQL achieves 12.5%, 10.9% and 22.7% higher match execution accuracy compared to the next best baseline (SBERT). TailorSQL achieves the greatest accuracy gains on FIBEN due to the benchmark's complex queries and cryptically-named schema items, which benefit from query hints. Additionally, to achieve the same accuracy, TailorSQL (constrained) uses 2-15$\times$ times fewer tokens and incurs lower latency than the baselines.
  • ...and 3 more figures