ODIN: A NL2SQL Recommender to Handle Schema Ambiguity
Kapil Vaidya, Abishek Sankararaman, Jialin Ding, Chuan Lei, Xiao Qin, Balakrishnan Narayanaswamy, Tim Kraska
TL;DR
ODIN tackles NL2SQL under schema ambiguity by combining a Generate-Select framework with masked-schema generation, a conformal-prediction-based selector, and a personalization module that learns from user feedback. The Generator creates diverse candidate SQL queries by masking schema elements to reveal alternative interpretations, while the Selector prunes unlikely options with statistical guarantees, preserving the correct query. Personalization, via textual hints and adaptive schema linking, aligns future recommendations with user preferences. Evaluations on AmbiQT and Mod-AmbiQT demonstrate that ODIN increases the likelihood of including the correct query by $1.5$–$2\times$ while reducing the number of presented candidates by $2$–$2.5\times$, outperforming diversity-based baselines.
Abstract
NL2SQL (natural language to SQL) systems translate natural language into SQL queries, allowing users with no technical background to interact with databases and create tools like reports or visualizations. While recent advancements in large language models (LLMs) have significantly improved NL2SQL accuracy, schema ambiguity remains a major challenge in enterprise environments with complex schemas, where multiple tables and columns with semantically similar names often co-exist. To address schema ambiguity, we introduce ODIN, a NL2SQL recommendation engine. Instead of producing a single SQL query given a natural language question, ODIN generates a set of potential SQL queries by accounting for different interpretations of ambiguous schema components. ODIN dynamically adjusts the number of suggestions based on the level of ambiguity, and ODIN learns from user feedback to personalize future SQL query recommendations. Our evaluation shows that ODIN improves the likelihood of generating the correct SQL query by 1.5-2$\times$ compared to baselines.
