Node-Level Uncertainty Estimation in LLM-Generated SQL
Hilaf Hasson, Ruocheng Guo
TL;DR
The paper tackles the problem of errors in LLM-generated SQL by estimating uncertainty at the node level within the query's AST. It introduces a semantically grounded ground-truth labeling scheme and a rich 72-feature featurization, trained with a gradient-boosted tree to produce calibrated per-node error probabilities. Empirical results across diverse databases and datasets show substantial improvements over token-logprob baselines and solid cross-database robustness, highlighting the approach's practical utility for targeted repair and selective execution. This node-centric, semantically aware uncertainty estimation offers an interpretable and effective alternative to aggregate sequence-level confidence measures, with direct implications for real-world Text-to-SQL pipelines.
Abstract
We present a practical framework for detecting errors in LLM-generated SQL by estimating uncertainty at the level of individual nodes in the query's abstract syntax tree (AST). Our approach proceeds in two stages. First, we introduce a semantically aware labeling algorithm that, given a generated SQL and a gold reference, assigns node-level correctness without over-penalizing structural containers or alias variation. Second, we represent each node with a rich set of schema-aware and lexical features - capturing identifier validity, alias resolution, type compatibility, ambiguity in scope, and typo signals - and train a supervised classifier to predict per-node error probabilities. We interpret these probabilities as calibrated uncertainty, enabling fine-grained diagnostics that pinpoint exactly where a query is likely to be wrong. Across multiple databases and datasets, our method substantially outperforms token log-probabilities: average AUC improves by +27.44% while maintaining robustness under cross-database evaluation. Beyond serving as an accuracy signal, node-level uncertainty supports targeted repair, human-in-the-loop review, and downstream selective execution. Together, these results establish node-centric, semantically grounded uncertainty estimation as a strong and interpretable alternative to aggregate sequence level confidence measures.
