Table of Contents
Fetching ...

ODIN: A NL2SQL Recommender to Handle Schema Ambiguity

Kapil Vaidya, Abishek Sankararaman, Jialin Ding, Chuan Lei, Xiao Qin, Balakrishnan Narayanaswamy, Tim Kraska

TL;DR

ODIN tackles NL2SQL under schema ambiguity by combining a Generate-Select framework with masked-schema generation, a conformal-prediction-based selector, and a personalization module that learns from user feedback. The Generator creates diverse candidate SQL queries by masking schema elements to reveal alternative interpretations, while the Selector prunes unlikely options with statistical guarantees, preserving the correct query. Personalization, via textual hints and adaptive schema linking, aligns future recommendations with user preferences. Evaluations on AmbiQT and Mod-AmbiQT demonstrate that ODIN increases the likelihood of including the correct query by $1.5$–$2\times$ while reducing the number of presented candidates by $2$–$2.5\times$, outperforming diversity-based baselines.

Abstract

NL2SQL (natural language to SQL) systems translate natural language into SQL queries, allowing users with no technical background to interact with databases and create tools like reports or visualizations. While recent advancements in large language models (LLMs) have significantly improved NL2SQL accuracy, schema ambiguity remains a major challenge in enterprise environments with complex schemas, where multiple tables and columns with semantically similar names often co-exist. To address schema ambiguity, we introduce ODIN, a NL2SQL recommendation engine. Instead of producing a single SQL query given a natural language question, ODIN generates a set of potential SQL queries by accounting for different interpretations of ambiguous schema components. ODIN dynamically adjusts the number of suggestions based on the level of ambiguity, and ODIN learns from user feedback to personalize future SQL query recommendations. Our evaluation shows that ODIN improves the likelihood of generating the correct SQL query by 1.5-2$\times$ compared to baselines.

ODIN: A NL2SQL Recommender to Handle Schema Ambiguity

TL;DR

ODIN tackles NL2SQL under schema ambiguity by combining a Generate-Select framework with masked-schema generation, a conformal-prediction-based selector, and a personalization module that learns from user feedback. The Generator creates diverse candidate SQL queries by masking schema elements to reveal alternative interpretations, while the Selector prunes unlikely options with statistical guarantees, preserving the correct query. Personalization, via textual hints and adaptive schema linking, aligns future recommendations with user preferences. Evaluations on AmbiQT and Mod-AmbiQT demonstrate that ODIN increases the likelihood of including the correct query by while reducing the number of presented candidates by , outperforming diversity-based baselines.

Abstract

NL2SQL (natural language to SQL) systems translate natural language into SQL queries, allowing users with no technical background to interact with databases and create tools like reports or visualizations. While recent advancements in large language models (LLMs) have significantly improved NL2SQL accuracy, schema ambiguity remains a major challenge in enterprise environments with complex schemas, where multiple tables and columns with semantically similar names often co-exist. To address schema ambiguity, we introduce ODIN, a NL2SQL recommendation engine. Instead of producing a single SQL query given a natural language question, ODIN generates a set of potential SQL queries by accounting for different interpretations of ambiguous schema components. ODIN dynamically adjusts the number of suggestions based on the level of ambiguity, and ODIN learns from user feedback to personalize future SQL query recommendations. Our evaluation shows that ODIN improves the likelihood of generating the correct SQL query by 1.5-2 compared to baselines.

Paper Structure

This paper contains 24 sections, 1 theorem, 10 equations, 7 figures, 3 algorithms.

Key Result

theorem 1

Given a calibration dataset $\{(X_1, Y_1), \dots, (X_n, Y_n)\}$ drawn i.i.d. from the same distribution as the test point $(X_{\text{test}}, Y_{\text{test}})$, conformal prediction constructs a prediction set $C(X_{\text{test}})$ using the threshold in eq:quantile for the true outcome $Y_{\text{test where $\alpha$ is a user-specified significance level. This guarantee holds regardless of the under

Figures (7)

  • Figure 1: Odin Overview
  • Figure 2: Prompt template used to score SQL queries using LLMs
  • Figure 3: Prompt template used to provide hints to LLM
  • Figure 4: Different types of ambiguities in the AmbiQT Benchmark.
  • Figure 5: Execution Match Accuracy ($AvgAcc$) versus the average number of results shown to the user ($AvgResultSize$) across three different ambiguity types for various baselines. Odin can achieve up to twice the accuracy while presenting only half the number of SQL queries to the user compared to the next best baseline.
  • ...and 2 more figures

Theorems & Definitions (1)

  • theorem 1: Conformal Prediction Guarantee, vovk2005line